Writing an OS in Rust https://os.phil-opp.com This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code. Zola en Wed, 01 Apr 2020 00:00:00 +0000 Updates in March 2020 Wed, 01 Apr 2020 00:00:00 +0000 https://os.phil-opp.com/status-update/2020-04-01/ https://os.phil-opp.com/status-update/2020-04-01/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the corresponding libraries and tools.</p> <p>I focused my time this month on finishing the long-planned post about <a href="https://os.phil-opp.com/async-await/"><strong>Async/Await</strong></a>. In addition to that, there were a few updates to the crates behind the scenes, including some great contributions and a new <code>vga</code> crate.</p> <p>As mentioned in the <em>Async/Await</em> post, I’m currently looking for job in Karlsruhe (Germany) or remote, so please let me know if you’re interested.</p> <h2 id="blog-os"><code>blog_os</code></h2> <p>The repository of the <em>Writing an OS in Rust</em> blog received the following updates:</p> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/763">Update linked_list_allocator to v0.8.0</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/764">Update x86_64 dependency to version 0.9.6</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/767">New post about Async/Await</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/774">Discuss the approach of storing offsets for self-referential structs</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/782">Use a static counter for assigning task IDs</a></li> </ul> <p>In addition to the changes above, there were a lot of <a href="https://github.com/phil-opp/blog_os/pulls?q=is%3Apr+is%3Aclosed+created%3A2020-03-01..2020-04-02+-author%3Aphil-opp+">typo fixes</a> by external contributors. Thanks a lot!</p> <h2 id="x86-64"><code>x86_64</code></h2> <p>The <code>x86_64</code> crate provides support for CPU-specific instructions, registers, and data structures of the <code>x86_64</code> architecture. In March, there was only a single addition, which was required for the <em>Async/Await</em> post:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/138">Add an enable_interrupts_and_hlt function that executes <code>sti; hlt</code></a> <span class="gray">(released as v0.9.6)</span></li> </ul> <h2 id="bootloader"><code>bootloader</code></h2> <p>The bootloader crate received two contributions this month:</p> <ul> <li><a href="https://github.com/rust-osdev/bootloader/pull/101">Implement boot-info-address</a> by <a href="https://github.com/Darksecond">@Darksecond</a> <span class="gray">(released as v0.8.9)</span></li> <li><a href="https://github.com/rust-osdev/bootloader/pull/104">Identity-map complete vga region (0xa0000 to 0xc0000)</a> by <a href="https://github.com/RKennedy9064">@RKennedy9064</a> <span class="gray">(released as v0.9.0)</span></li> </ul> <h2 id="bootimage"><code>bootimage</code></h2> <p>The <code>bootimage</code> tool builds the <code>bootloader</code> and creates a bootable disk image from a kernel. It received a RUSTFLAGS-related bugfix:</p> <ul> <li><a href="https://github.com/rust-osdev/bootimage/pull/51">Set empty RUSTFLAGS to ensure that no .cargo/config applies</a></li> </ul> <!-- ## `cargo-xbuild` The `cargo-xbuild` crate provides support for cross-compiling `libcore` and `liballoc`. There were no updates to it this month. ## `uart_16550` The `uart_16550` crate provides basic support for uart_16550 serial output. It received no updates this month. ## `multiboot2-elf64` The `multiboot2-elf64` crate provides abstractions for reading the boot information of the multiboot 2 standard, which is implemented by bootloaders like GRUB. There were no updates to the crate in March. --> <h2 id="vga"><code>vga</code></h2> <p>There is a new crate under the <code>rust-osdev</code> organization: <a href="https://github.com/rust-osdev/vga"><code>vga</code></a> created by <a href="https://github.com/RKennedy9064">@RKennedy9064</a>. The purpose of the library is to provide abstractions for the VGA hardware. For example, the crate allows to switch the VGA hardware to graphics mode, which makes it possible to draw on a pixel-based framebuffer:</p> <p><img src="https://os.phil-opp.com/status-update/2020-04-01/qemu-vga-crate.png" alt="QEMU printing a box with “Hello World” in it" /></p> <p>For more information about the crate, check out its <a href="https://docs.rs/vga/0.2.2/vga/">API documentation</a> and the <a href="https://github.com/rust-osdev/vga">GitHub repository</a>.</p> Async/Await Fri, 27 Mar 2020 00:00:00 +0000 https://os.phil-opp.com/async-await/ https://os.phil-opp.com/async-await/ <p>In this post, we explore <em>cooperative multitasking</em> and the <em>async/await</em> feature of Rust. We take a detailed look at how async/await works in Rust, including the design of the <code>Future</code> trait, the state machine transformation, and <em>pinning</em>. We then add basic support for async/await to our kernel by creating an asynchronous keyboard task and a basic executor.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/async-await/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-12"><code>post-12</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="multitasking"><a class="zola-anchor" href="#multitasking" aria-label="Anchor link for: multitasking">🔗</a>Multitasking</h2> <p>One of the fundamental features of most operating systems is <a href="https://en.wikipedia.org/wiki/Computer_multitasking"><em>multitasking</em></a>, which is the ability to execute multiple tasks concurrently. For example, you probably have other programs open while looking at this post, such as a text editor or a terminal window. Even if you have only a single browser window open, there are probably various background tasks for managing your desktop windows, checking for updates, or indexing files.</p> <p>While it seems like all tasks run in parallel, only a single task can be executed on a CPU core at a time. To create the illusion that the tasks run in parallel, the operating system rapidly switches between active tasks so that each one can make a bit of progress. Since computers are fast, we don’t notice these switches most of the time.</p> <p>While single-core CPUs can only execute a single task at a time, multi-core CPUs can run multiple tasks in a truly parallel way. For example, a CPU with 8 cores can run 8 tasks at the same time. We will explain how to setup multi-core CPUs in a future post. For this post, we will focus on single-core CPUs for simplicity. (It’s worth noting that all multi-core CPUs start with only a single active core, so we can treat them as single-core CPUs for now.)</p> <p>There are two forms of multitasking: <em>Cooperative</em> multitasking requires tasks to regularly give up control of the CPU so that other tasks can make progress. <em>Preemptive</em> multitasking uses operating system functionality to switch threads at arbitrary points in time by forcibly pausing them. In the following we will explore the two forms of multitasking in more detail and discuss their respective advantages and drawbacks.</p> <h3 id="preemptive-multitasking"><a class="zola-anchor" href="#preemptive-multitasking" aria-label="Anchor link for: preemptive-multitasking">🔗</a>Preemptive Multitasking</h3> <p>The idea behind preemptive multitasking is that the operating system controls when to switch tasks. For that, it utilizes the fact that it regains control of the CPU on each interrupt. This makes it possible to switch tasks whenever new input is available to the system. For example, it would be possible to switch tasks when the mouse is moved or a network packet arrives. The operating system can also determine the exact time that a task is allowed to run by configuring a hardware timer to send an interrupt after that time.</p> <p>The following graphic illustrates the task switching process on a hardware interrupt:</p> <p><img src="https://os.phil-opp.com/async-await/regain-control-on-interrupt.svg" alt="" /></p> <p>In the first row, the CPU is executing task <code>A1</code> of program <code>A</code>. All other tasks are paused. In the second row, a hardware interrupt arrives at the CPU. As described in the <a href="https://os.phil-opp.com/hardware-interrupts/"><em>Hardware Interrupts</em></a> post, the CPU immediately stops the execution of task <code>A1</code> and jumps to the interrupt handler defined in the interrupt descriptor table (IDT). Through this interrupt handler, the operating system now has control of the CPU again, which allows it to switch to task <code>B1</code> instead of continuing task <code>A1</code>.</p> <h4 id="saving-state"><a class="zola-anchor" href="#saving-state" aria-label="Anchor link for: saving-state">🔗</a>Saving State</h4> <p>Since tasks are interrupted at arbitrary points in time, they might be in the middle of some calculations. In order to be able to resume them later, the operating system must backup the whole state of the task, including its <a href="https://en.wikipedia.org/wiki/Call_stack">call stack</a> and the values of all CPU registers. This process is called a <a href="https://en.wikipedia.org/wiki/Context_switch"><em>context switch</em></a>.</p> <p>As the call stack can be very large, the operating system typically sets up a separate call stack for each task instead of backing up the call stack content on each task switch. Such a task with its own stack is called a <a href="https://en.wikipedia.org/wiki/Thread_(computing)"><em>thread of execution</em></a> or <em>thread</em> for short. By using a separate stack for each task, only the register contents need to be saved on a context switch (including the program counter and stack pointer). This approach minimizes the performance overhead of a context switch, which is very important since context switches often occur up to 100 times per second.</p> <h4 id="discussion"><a class="zola-anchor" href="#discussion" aria-label="Anchor link for: discussion">🔗</a>Discussion</h4> <p>The main advantage of preemptive multitasking is that the operating system can fully control the allowed execution time of a task. This way, it can guarantee that each task gets a fair share of the CPU time, without the need to trust the tasks to cooperate. This is especially important when running third-party tasks or when multiple users share a system.</p> <p>The disadvantage of preemption is that each task requires its own stack. Compared to a shared stack, this results in higher memory usage per task and often limits the number of tasks in the system. Another disadvantage is that the operating system always has to save the complete CPU register state on each task switch, even if the task only used a small subset of the registers.</p> <p>Preemptive multitasking and threads are fundamental components of an operating system because they make it possible to run untrusted userspace programs. We will discuss these concepts in full detail in future posts. For this post, however, we will focus on cooperative multitasking, which also provides useful capabilities for our kernel.</p> <h3 id="cooperative-multitasking"><a class="zola-anchor" href="#cooperative-multitasking" aria-label="Anchor link for: cooperative-multitasking">🔗</a>Cooperative Multitasking</h3> <p>Instead of forcibly pausing running tasks at arbitrary points in time, cooperative multitasking lets each task run until it voluntarily gives up control of the CPU. This allows tasks to pause themselves at convenient points in time, for example, when they need to wait for an I/O operation anyway.</p> <p>Cooperative multitasking is often used at the language level, like in the form of <a href="https://en.wikipedia.org/wiki/Coroutine">coroutines</a> or <a href="https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html">async/await</a>. The idea is that either the programmer or the compiler inserts <a href="https://en.wikipedia.org/wiki/Yield_(multithreading)"><em>yield</em></a> operations into the program, which give up control of the CPU and allow other tasks to run. For example, a yield could be inserted after each iteration of a complex loop.</p> <p>It is common to combine cooperative multitasking with <a href="https://en.wikipedia.org/wiki/Asynchronous_I/O">asynchronous operations</a>. Instead of waiting until an operation is finished and preventing other tasks from running during this time, asynchronous operations return a “not ready” status if the operation is not finished yet. In this case, the waiting task can execute a yield operation to let other tasks run.</p> <h4 id="saving-state-1"><a class="zola-anchor" href="#saving-state-1" aria-label="Anchor link for: saving-state-1">🔗</a>Saving State</h4> <p>Since tasks define their pause points themselves, they don’t need the operating system to save their state. Instead, they can save exactly the state they need for continuation before they pause themselves, which often results in better performance. For example, a task that just finished a complex computation might only need to backup the final result of the computation since it does not need the intermediate results anymore.</p> <p>Language-supported implementations of cooperative tasks are often even able to backup the required parts of the call stack before pausing. As an example, Rust’s async/await implementation stores all local variables that are still needed in an automatically generated struct (see below). By backing up the relevant parts of the call stack before pausing, all tasks can share a single call stack, which results in much lower memory consumption per task. This makes it possible to create an almost arbitrary number of cooperative tasks without running out of memory.</p> <h4 id="discussion-1"><a class="zola-anchor" href="#discussion-1" aria-label="Anchor link for: discussion-1">🔗</a>Discussion</h4> <p>The drawback of cooperative multitasking is that an uncooperative task can potentially run for an unlimited amount of time. Thus, a malicious or buggy task can prevent other tasks from running and slow down or even block the whole system. For this reason, cooperative multitasking should only be used when all tasks are known to cooperate. As a counterexample, it’s not a good idea to make the operating system rely on the cooperation of arbitrary user-level programs.</p> <p>However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage <em>within</em> a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for implementing concurrency.</p> <h2 id="async-await-in-rust"><a class="zola-anchor" href="#async-await-in-rust" aria-label="Anchor link for: async-await-in-rust">🔗</a>Async/Await in Rust</h2> <p>The Rust language provides first-class support for cooperative multitasking in the form of async/await. Before we can explore what async/await is and how it works, we need to understand how <em>futures</em> and asynchronous programming work in Rust.</p> <h3 id="futures"><a class="zola-anchor" href="#futures" aria-label="Anchor link for: futures">🔗</a>Futures</h3> <p>A <em>future</em> represents a value that might not be available yet. This could be, for example, an integer that is computed by another task or a file that is downloaded from the network. Instead of waiting until the value is available, futures make it possible to continue execution until the value is needed.</p> <h4 id="example"><a class="zola-anchor" href="#example" aria-label="Anchor link for: example">🔗</a>Example</h4> <p>The concept of futures is best illustrated with a small example:</p> <p><img src="https://os.phil-opp.com/async-await/async-example.svg" alt="Sequence diagram: main calls read_file and is blocked until it returns; then it calls foo() and is also blocked until it returns. The same process is repeated, but this time async_read_file is called, which directly returns a future; then foo() is called again, which now runs concurrently with the file load. The file is available before foo() returns." /></p> <p>This sequence diagram shows a <code>main</code> function that reads a file from the file system and then calls a function <code>foo</code>. This process is repeated two times: once with a synchronous <code>read_file</code> call and once with an asynchronous <code>async_read_file</code> call.</p> <p>With the synchronous call, the <code>main</code> function needs to wait until the file is loaded from the file system. Only then can it call the <code>foo</code> function, which requires it to again wait for the result.</p> <p>With the asynchronous <code>async_read_file</code> call, the file system directly returns a future and loads the file asynchronously in the background. This allows the <code>main</code> function to call <code>foo</code> much earlier, which then runs in parallel with the file load. In this example, the file load even finishes before <code>foo</code> returns, so <code>main</code> can directly work with the file without further waiting after <code>foo</code> returns.</p> <h4 id="futures-in-rust"><a class="zola-anchor" href="#futures-in-rust" aria-label="Anchor link for: futures-in-rust">🔗</a>Futures in Rust</h4> <p>In Rust, futures are represented by the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html"><code>Future</code></a> trait, which looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub trait </span><span>Future { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Output</span><span>; </span><span> </span><span style="color:#569cd6;">fn </span><span>poll(self: Pin&lt;</span><span style="color:#569cd6;">&amp;mut Self</span><span>&gt;, cx: </span><span style="color:#569cd6;">&amp;mut</span><span> Context) -&gt; Poll&lt;</span><span style="color:#569cd6;">Self::</span><span>Output&gt;; </span><span>} </span></code></pre> <p>The <a href="https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#specifying-placeholder-types-in-trait-definitions-with-associated-types">associated type</a> <code>Output</code> specifies the type of the asynchronous value. For example, the <code>async_read_file</code> function in the diagram above would return a <code>Future</code> instance with <code>Output</code> set to <code>File</code>.</p> <p>The <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll"><code>poll</code></a> method allows to check if the value is already available. It returns a <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll"><code>Poll</code></a> enum, which looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub enum </span><span>Poll&lt;T&gt; { </span><span> Ready(T), </span><span> Pending, </span><span>} </span></code></pre> <p>When the value is already available (e.g. the file was fully read from disk), it is returned wrapped in the <code>Ready</code> variant. Otherwise, the <code>Pending</code> variant is returned, which signals to the caller that the value is not yet available.</p> <p>The <code>poll</code> method takes two arguments: <code>self: Pin&lt;&amp;mut Self&gt;</code> and <code>cx: &amp;mut Context</code>. The former behaves similarly to a normal <code>&amp;mut self</code> reference, except that the <code>Self</code> value is <a href="https://doc.rust-lang.org/nightly/core/pin/index.html"><em>pinned</em></a> to its memory location. Understanding <code>Pin</code> and why it is needed is difficult without understanding how async/await works first. We will therefore explain it later in this post.</p> <p>The purpose of the <code>cx: &amp;mut Context</code> parameter is to pass a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a> instance to the asynchronous task, e.g., the file system load. This <code>Waker</code> allows the asynchronous task to signal that it (or a part of it) is finished, e.g., that the file was loaded from disk. Since the main task knows that it will be notified when the <code>Future</code> is ready, it does not need to call <code>poll</code> over and over again. We will explain this process in more detail later in this post when we implement our own waker type.</p> <h3 id="working-with-futures"><a class="zola-anchor" href="#working-with-futures" aria-label="Anchor link for: working-with-futures">🔗</a>Working with Futures</h3> <p>We now know how futures are defined and understand the basic idea behind the <code>poll</code> method. However, we still don’t know how to effectively work with futures. The problem is that futures represent the results of asynchronous tasks, which might not be available yet. In practice, however, we often need these values directly for further calculations. So the question is: How can we efficiently retrieve the value of a future when we need it?</p> <h4 id="waiting-on-futures"><a class="zola-anchor" href="#waiting-on-futures" aria-label="Anchor link for: waiting-on-futures">🔗</a>Waiting on Futures</h4> <p>One possible answer is to wait until a future becomes ready. This could look something like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> future = async_read_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>); </span><span style="color:#569cd6;">let</span><span> file_content = </span><span style="color:#569cd6;">loop </span><span>{ </span><span> </span><span style="color:#569cd6;">match</span><span> future.poll(…) { </span><span> Poll::Ready(value) </span><span style="color:#569cd6;">=&gt; break</span><span> value, </span><span> Poll::Pending </span><span style="color:#569cd6;">=&gt; </span><span>{}, </span><span style="color:#608b4e;">// do nothing </span><span> } </span><span>} </span></code></pre> <p>Here we <em>actively</em> wait for the future by calling <code>poll</code> over and over again in a loop. The arguments to <code>poll</code> don’t matter here, so we omitted them. While this solution works, it is very inefficient because we keep the CPU busy until the value becomes available.</p> <p>A more efficient approach could be to <em>block</em> the current thread until the future becomes available. This is, of course, only possible if you have threads, so this solution does not work for our kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits of parallel tasks.</p> <h4 id="future-combinators"><a class="zola-anchor" href="#future-combinators" aria-label="Anchor link for: future-combinators">🔗</a>Future Combinators</h4> <p>An alternative to waiting is to use future combinators. Future combinators are methods like <code>map</code> that allow chaining and combining futures together, similar to the methods of the <a href="https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html"><code>Iterator</code></a> trait. Instead of waiting on the future, these combinators return a future themselves, which applies the mapping operation on <code>poll</code>.</p> <p>As an example, a simple <code>string_len</code> combinator for converting a <code>Future&lt;Output = String&gt;</code> to a <code>Future&lt;Output = usize&gt;</code> could look like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">struct </span><span>StringLen&lt;F&gt; { </span><span> inner_future: F, </span><span>} </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;F&gt; Future </span><span style="color:#569cd6;">for </span><span>StringLen&lt;F&gt; </span><span style="color:#569cd6;">where</span><span> F: Future&lt;Output = String&gt; { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Output </span><span>= </span><span style="color:#569cd6;">usize</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>poll(</span><span style="color:#569cd6;">mut </span><span>self: Pin&lt;</span><span style="color:#569cd6;">&amp;mut Self</span><span>&gt;, cx: </span><span style="color:#569cd6;">&amp;mut </span><span>Context&lt;&#39;</span><span style="color:#569cd6;">_</span><span>&gt;) -&gt; Poll&lt;T&gt; { </span><span> </span><span style="color:#569cd6;">match </span><span>self.inner_future.poll(cx) { </span><span> Poll::Ready(s) </span><span style="color:#569cd6;">=&gt; </span><span>Poll::Ready(s.len()), </span><span> Poll::Pending </span><span style="color:#569cd6;">=&gt; </span><span>Poll::Pending, </span><span> } </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>string_len(string: impl Future&lt;Output = String&gt;) </span><span> -&gt; impl Future&lt;Output = </span><span style="color:#569cd6;">usize</span><span>&gt; </span><span>{ </span><span> StringLen { </span><span> inner_future: string, </span><span> } </span><span>} </span><span> </span><span style="color:#608b4e;">// Usage </span><span style="color:#569cd6;">fn </span><span>file_len() -&gt; impl Future&lt;Output = </span><span style="color:#569cd6;">usize</span><span>&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> file_content_future = async_read_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>); </span><span> string_len(file_content_future) </span><span>} </span></code></pre> <p>This code does not quite work because it does not handle <a href="https://doc.rust-lang.org/stable/core/pin/index.html"><em>pinning</em></a>, but it suffices as an example. The basic idea is that the <code>string_len</code> function wraps a given <code>Future</code> instance into a new <code>StringLen</code> struct, which also implements <code>Future</code>. When the wrapped future is polled, it polls the inner future. If the value is not ready yet, <code>Poll::Pending</code> is returned from the wrapped future too. If the value is ready, the string is extracted from the <code>Poll::Ready</code> variant and its length is calculated. Afterwards, it is wrapped in <code>Poll::Ready</code> again and returned.</p> <p>With this <code>string_len</code> function, we can calculate the length of an asynchronous string without waiting for it. Since the function returns a <code>Future</code> again, the caller can’t work directly on the returned value, but needs to use combinator functions again. This way, the whole call graph becomes asynchronous and we can efficiently wait for multiple futures at once at some point, e.g., in the main function.</p> <p>Because manually writing combinator functions is difficult, they are often provided by libraries. While the Rust standard library itself provides no combinator methods yet, the semi-official (and <code>no_std</code> compatible) <a href="https://docs.rs/futures/0.3.4/futures/"><code>futures</code></a> crate does. Its <a href="https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html"><code>FutureExt</code></a> trait provides high-level combinator methods such as <a href="https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map"><code>map</code></a> or <a href="https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then"><code>then</code></a>, which can be used to manipulate the result with arbitrary closures.</p> <h5 id="advantages"><a class="zola-anchor" href="#advantages" aria-label="Anchor link for: advantages">🔗</a>Advantages</h5> <p>The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimize them. For more details, see the <a href="https://aturon.github.io/blog/2016/08/11/futures/"><em>Zero-cost futures in Rust</em></a> post, which announced the addition of futures to the Rust ecosystem.</p> <h5 id="drawbacks"><a class="zola-anchor" href="#drawbacks" aria-label="Anchor link for: drawbacks">🔗</a>Drawbacks</h5> <p>While future combinators make it possible to write very efficient code, they can be difficult to use in some situations because of the type system and the closure-based interface. For example, consider code like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>example(min_len: </span><span style="color:#569cd6;">usize</span><span>) -&gt; impl Future&lt;Output = String&gt; { </span><span> async_read_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>).then(</span><span style="color:#569cd6;">move |</span><span>content</span><span style="color:#569cd6;">| </span><span>{ </span><span> </span><span style="color:#569cd6;">if</span><span> content.len() &lt; min_len { </span><span> Either::Left(async_read_file(</span><span style="color:#d69d85;">&quot;bar.txt&quot;</span><span>).map(|s| content + </span><span style="color:#569cd6;">&amp;</span><span>s)) </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> Either::Right(future::ready(content)) </span><span> } </span><span> }) </span><span>} </span></code></pre> <p>(<a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=91fc09024eecb2448a85a7ef6a97b8d8">Try it on the playground</a>)</p> <p>Here we read the file <code>foo.txt</code> and then use the <a href="https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then"><code>then</code></a> combinator to chain a second future based on the file content. If the content length is smaller than the given <code>min_len</code>, we read a different <code>bar.txt</code> file and append it to <code>content</code> using the <a href="https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map"><code>map</code></a> combinator. Otherwise, we return only the content of <code>foo.txt</code>.</p> <p>We need to use the <a href="https://doc.rust-lang.org/std/keyword.move.html"><code>move</code> keyword</a> for the closure passed to <code>then</code> because otherwise there would be a lifetime error for <code>min_len</code>. The reason for the <a href="https://docs.rs/futures/0.3.4/futures/future/enum.Either.html"><code>Either</code></a> wrapper is that <code>if</code> and <code>else</code> blocks must always have the same type. Since we return different future types in the blocks, we must use the wrapper type to unify them into a single type. The <a href="https://docs.rs/futures/0.3.4/futures/future/fn.ready.html"><code>ready</code></a> function wraps a value into a future, which is immediately ready. The function is required here because the <code>Either</code> wrapper expects that the wrapped value implements <code>Future</code>.</p> <p>As you can imagine, this can quickly lead to very complex code for larger projects. It gets especially complicated if borrowing and different lifetimes are involved. For this reason, a lot of work was invested in adding support for async/await to Rust, with the goal of making asynchronous code radically simpler to write.</p> <h3 id="the-async-await-pattern"><a class="zola-anchor" href="#the-async-await-pattern" aria-label="Anchor link for: the-async-await-pattern">🔗</a>The Async/Await Pattern</h3> <p>The idea behind async/await is to let the programmer write code that <em>looks</em> like normal synchronous code, but is turned into asynchronous code by the compiler. It works based on the two keywords <code>async</code> and <code>await</code>. The <code>async</code> keyword can be used in a function signature to turn a synchronous function into an asynchronous function that returns a future:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#569cd6;">fn </span><span>foo() -&gt; </span><span style="color:#569cd6;">u32 </span><span>{ </span><span> </span><span style="color:#b5cea8;">0 </span><span>} </span><span> </span><span style="color:#608b4e;">// the above is roughly translated by the compiler to: </span><span style="color:#569cd6;">fn </span><span>foo() -&gt; impl Future&lt;Output = </span><span style="color:#569cd6;">u32</span><span>&gt; { </span><span> future::ready(</span><span style="color:#b5cea8;">0</span><span>) </span><span>} </span></code></pre> <p>This keyword alone wouldn’t be that useful. However, inside <code>async</code> functions, the <code>await</code> keyword can be used to retrieve the asynchronous value of a future:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#569cd6;">fn </span><span>example(min_len: </span><span style="color:#569cd6;">usize</span><span>) -&gt; String { </span><span> </span><span style="color:#569cd6;">let</span><span> content = async_read_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>).await; </span><span> </span><span style="color:#569cd6;">if</span><span> content.len() &lt; min_len { </span><span> content + </span><span style="color:#569cd6;">&amp;</span><span>async_read_file(</span><span style="color:#d69d85;">&quot;bar.txt&quot;</span><span>).await </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> content </span><span> } </span><span>} </span></code></pre> <p>(<a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=d93c28509a1c67661f31ff820281d434">Try it on the playground</a>)</p> <p>This function is a direct translation of the <code>example</code> function from <a href="https://os.phil-opp.com/async-await/#drawbacks">above</a> that used combinator functions. Using the <code>.await</code> operator, we can retrieve the value of a future without needing any closures or <code>Either</code> types. As a result, we can write our code like we write normal synchronous code, with the difference that <em>this is still asynchronous code</em>.</p> <h4 id="state-machine-transformation"><a class="zola-anchor" href="#state-machine-transformation" aria-label="Anchor link for: state-machine-transformation">🔗</a>State Machine Transformation</h4> <p>Behind the scenes, the compiler converts the body of the <code>async</code> function into a <a href="https://en.wikipedia.org/wiki/Finite-state_machine"><em>state machine</em></a>, with each <code>.await</code> call representing a different state. For the above <code>example</code> function, the compiler creates a state machine with the following four states:</p> <p><img src="https://os.phil-opp.com/async-await/async-state-machine-states.svg" alt="Four states: start, waiting on foo.txt, waiting on bar.txt, end" /></p> <p>Each state represents a different pause point in the function. The <em>“Start”</em> and <em>“End”</em> states represent the function at the beginning and end of its execution. The <em>“Waiting on foo.txt”</em> state represents that the function is currently waiting for the first <code>async_read_file</code> result. Similarly, the <em>“Waiting on bar.txt”</em> state represents the pause point where the function is waiting on the second <code>async_read_file</code> result.</p> <p>The state machine implements the <code>Future</code> trait by making each <code>poll</code> call a possible state transition:</p> <p><img src="https://os.phil-opp.com/async-await/async-state-machine-basic.svg" alt="Four states and their transitions: start, waiting on foo.txt, waiting on bar.txt, end" /></p> <p>The diagram uses arrows to represent state switches and diamond shapes to represent alternative ways. For example, if the <code>foo.txt</code> file is not ready, the path marked with <em>“no”</em> is taken and the <em>“Waiting on foo.txt”</em> state is reached. Otherwise, the <em>“yes”</em> path is taken. The small red diamond without a caption represents the <code>if content.len() &lt; 100</code> branch of the <code>example</code> function.</p> <p>We see that the first <code>poll</code> call starts the function and lets it run until it reaches a future that is not ready yet. If all futures on the path are ready, the function can run till the <em>“End”</em> state, where it returns its result wrapped in <code>Poll::Ready</code>. Otherwise, the state machine enters a waiting state and returns <code>Poll::Pending</code>. On the next <code>poll</code> call, the state machine then starts from the last waiting state and retries the last operation.</p> <h4 id="saving-state-2"><a class="zola-anchor" href="#saving-state-2" aria-label="Anchor link for: saving-state-2">🔗</a>Saving State</h4> <p>In order to be able to continue from the last waiting state, the state machine must keep track of the current state internally. In addition, it must save all the variables that it needs to continue execution on the next <code>poll</code> call. This is where the compiler can really shine: Since it knows which variables are used when, it can automatically generate structs with exactly the variables that are needed.</p> <p>As an example, the compiler generates structs like the following for the above <code>example</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// The `example` function again so that you don&#39;t have to scroll up </span><span>async </span><span style="color:#569cd6;">fn </span><span>example(min_len: </span><span style="color:#569cd6;">usize</span><span>) -&gt; String { </span><span> </span><span style="color:#569cd6;">let</span><span> content = async_read_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>).await; </span><span> </span><span style="color:#569cd6;">if</span><span> content.len() &lt; min_len { </span><span> content + </span><span style="color:#569cd6;">&amp;</span><span>async_read_file(</span><span style="color:#d69d85;">&quot;bar.txt&quot;</span><span>).await </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> content </span><span> } </span><span>} </span><span> </span><span style="color:#608b4e;">// The compiler-generated state structs: </span><span> </span><span style="color:#569cd6;">struct </span><span>StartState { </span><span> min_len: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">struct </span><span>WaitingOnFooTxtState { </span><span> min_len: </span><span style="color:#569cd6;">usize</span><span>, </span><span> foo_txt_future: impl Future&lt;Output = String&gt;, </span><span>} </span><span> </span><span style="color:#569cd6;">struct </span><span>WaitingOnBarTxtState { </span><span> content: String, </span><span> bar_txt_future: impl Future&lt;Output = String&gt;, </span><span>} </span><span> </span><span style="color:#569cd6;">struct </span><span>EndState {} </span></code></pre> <p>In the “start” and <em>“Waiting on foo.txt”</em> states, the <code>min_len</code> parameter needs to be stored for the later comparison with <code>content.len()</code>. The <em>“Waiting on foo.txt”</em> state additionally stores a <code>foo_txt_future</code>, which represents the future returned by the <code>async_read_file</code> call. This future needs to be polled again when the state machine continues, so it needs to be saved.</p> <p>The <em>“Waiting on bar.txt”</em> state contains the <code>content</code> variable for the later string concatenation when <code>bar.txt</code> is ready. It also stores a <code>bar_txt_future</code> that represents the in-progress load of <code>bar.txt</code>. The struct does not contain the <code>min_len</code> variable because it is no longer needed after the <code>content.len()</code> comparison. In the <em>“end”</em> state, no variables are stored because the function has already run to completion.</p> <p>Keep in mind that this is only an example of the code that the compiler could generate. The struct names and the field layout are implementation details and might be different.</p> <h4 id="the-full-state-machine-type"><a class="zola-anchor" href="#the-full-state-machine-type" aria-label="Anchor link for: the-full-state-machine-type">🔗</a>The Full State Machine Type</h4> <p>While the exact compiler-generated code is an implementation detail, it helps in understanding to imagine how the generated state machine <em>could</em> look for the <code>example</code> function. We already defined the structs representing the different states and containing the required variables. To create a state machine on top of them, we can combine them into an <a href="https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html"><code>enum</code></a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">enum </span><span>ExampleStateMachine { </span><span> Start(StartState), </span><span> WaitingOnFooTxt(WaitingOnFooTxtState), </span><span> WaitingOnBarTxt(WaitingOnBarTxtState), </span><span> End(EndState), </span><span>} </span></code></pre> <p>We define a separate enum variant for each state and add the corresponding state struct to each variant as a field. To implement the state transitions, the compiler generates an implementation of the <code>Future</code> trait based on the <code>example</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>Future </span><span style="color:#569cd6;">for </span><span>ExampleStateMachine { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Output </span><span>= String; </span><span style="color:#608b4e;">// return type of `example` </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>poll(self: Pin&lt;</span><span style="color:#569cd6;">&amp;mut Self</span><span>&gt;, cx: </span><span style="color:#569cd6;">&amp;mut</span><span> Context) -&gt; Poll&lt;</span><span style="color:#569cd6;">Self::</span><span>Output&gt; { </span><span> </span><span style="color:#569cd6;">loop </span><span>{ </span><span> </span><span style="color:#569cd6;">match </span><span>self { </span><span style="color:#608b4e;">// TODO: handle pinning </span><span> ExampleStateMachine::Start(state) </span><span style="color:#569cd6;">=&gt; </span><span>{…} </span><span> ExampleStateMachine::WaitingOnFooTxt(state) </span><span style="color:#569cd6;">=&gt; </span><span>{…} </span><span> ExampleStateMachine::WaitingOnBarTxt(state) </span><span style="color:#569cd6;">=&gt; </span><span>{…} </span><span> ExampleStateMachine::End(state) </span><span style="color:#569cd6;">=&gt; </span><span>{…} </span><span> } </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The <code>Output</code> type of the future is <code>String</code> because it’s the return type of the <code>example</code> function. To implement the <code>poll</code> function, we use a <code>match</code> statement on the current state inside a <code>loop</code>. The idea is that we switch to the next state as long as possible and use an explicit <code>return Poll::Pending</code> when we can’t continue.</p> <p>For simplicity, we only show simplified code and don’t handle <a href="https://doc.rust-lang.org/stable/core/pin/index.html">pinning</a>, ownership, lifetimes, etc. So this and the following code should be treated as pseudo-code and not used directly. Of course, the real compiler-generated code handles everything correctly, albeit possibly in a different way.</p> <p>To keep the code excerpts small, we present the code for each <code>match</code> arm separately. Let’s begin with the <code>Start</code> state:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>ExampleStateMachine::Start(state) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#608b4e;">// from body of `example` </span><span> </span><span style="color:#569cd6;">let</span><span> foo_txt_future = async_read_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>); </span><span> </span><span style="color:#608b4e;">// `.await` operation </span><span> </span><span style="color:#569cd6;">let</span><span> state = WaitingOnFooTxtState { </span><span> min_len: state.min_len, </span><span> foo_txt_future, </span><span> }; </span><span> *self = ExampleStateMachine::WaitingOnFooTxt(state); </span><span>} </span></code></pre> <p>The state machine is in the <code>Start</code> state when it is right at the beginning of the function. In this case, we execute all the code from the body of the <code>example</code> function until the first <code>.await</code>. To handle the <code>.await</code> operation, we change the state of the <code>self</code> state machine to <code>WaitingOnFooTxt</code>, which includes the construction of the <code>WaitingOnFooTxtState</code> struct.</p> <p>Since the <code>match self {…}</code> statement is executed in a loop, the execution jumps to the <code>WaitingOnFooTxt</code> arm next:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>ExampleStateMachine::WaitingOnFooTxt(state) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">match</span><span> state.foo_txt_future.poll(cx) { </span><span> Poll::Pending </span><span style="color:#569cd6;">=&gt; return </span><span>Poll::Pending, </span><span> Poll::Ready(content) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#608b4e;">// from body of `example` </span><span> </span><span style="color:#569cd6;">if</span><span> content.len() &lt; state.min_len { </span><span> </span><span style="color:#569cd6;">let</span><span> bar_txt_future = async_read_file(</span><span style="color:#d69d85;">&quot;bar.txt&quot;</span><span>); </span><span> </span><span style="color:#608b4e;">// `.await` operation </span><span> </span><span style="color:#569cd6;">let</span><span> state = WaitingOnBarTxtState { </span><span> content, </span><span> bar_txt_future, </span><span> }; </span><span> *self = ExampleStateMachine::WaitingOnBarTxt(state); </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> *self = ExampleStateMachine::End(EndState); </span><span> </span><span style="color:#569cd6;">return </span><span>Poll::Ready(content); </span><span> } </span><span> } </span><span> } </span><span>} </span></code></pre> <p>In this <code>match</code> arm, we first call the <code>poll</code> function of the <code>foo_txt_future</code>. If it is not ready, we exit the loop and return <code>Poll::Pending</code>. Since <code>self</code> stays in the <code>WaitingOnFooTxt</code> state in this case, the next <code>poll</code> call on the state machine will enter the same <code>match</code> arm and retry polling the <code>foo_txt_future</code>.</p> <p>When the <code>foo_txt_future</code> is ready, we assign the result to the <code>content</code> variable and continue to execute the code of the <code>example</code> function: If <code>content.len()</code> is smaller than the <code>min_len</code> saved in the state struct, the <code>bar.txt</code> file is read asynchronously. We again translate the <code>.await</code> operation into a state change, this time into the <code>WaitingOnBarTxt</code> state. Since we’re executing the <code>match</code> inside a loop, the execution directly jumps to the <code>match</code> arm for the new state afterward, where the <code>bar_txt_future</code> is polled.</p> <p>In case we enter the <code>else</code> branch, no further <code>.await</code> operation occurs. We reach the end of the function and return <code>content</code> wrapped in <code>Poll::Ready</code>. We also change the current state to the <code>End</code> state.</p> <p>The code for the <code>WaitingOnBarTxt</code> state looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>ExampleStateMachine::WaitingOnBarTxt(state) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">match</span><span> state.bar_txt_future.poll(cx) { </span><span> Poll::Pending </span><span style="color:#569cd6;">=&gt; return </span><span>Poll::Pending, </span><span> Poll::Ready(bar_txt) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> *self = ExampleStateMachine::End(EndState); </span><span> </span><span style="color:#608b4e;">// from body of `example` </span><span> </span><span style="color:#569cd6;">return </span><span>Poll::Ready(state.content + </span><span style="color:#569cd6;">&amp;</span><span>bar_txt); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>Similar to the <code>WaitingOnFooTxt</code> state, we start by polling the <code>bar_txt_future</code>. If it is still pending, we exit the loop and return <code>Poll::Pending</code>. Otherwise, we can perform the last operation of the <code>example</code> function: concatenating the <code>content</code> variable with the result from the future. We update the state machine to the <code>End</code> state and then return the result wrapped in <code>Poll::Ready</code>.</p> <p>Finally, the code for the <code>End</code> state looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>ExampleStateMachine::End(</span><span style="color:#569cd6;">_</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> panic!(</span><span style="color:#d69d85;">&quot;poll called after Poll::Ready was returned&quot;</span><span>); </span><span>} </span></code></pre> <p>Futures should not be polled again after they returned <code>Poll::Ready</code>, so we panic if <code>poll</code> is called while we are already in the <code>End</code> state.</p> <p>We now know what the compiler-generated state machine and its implementation of the <code>Future</code> trait <em>could</em> look like. In practice, the compiler generates code in a different way. (In case you’re interested, the implementation is currently based on <a href="https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html"><em>coroutines</em></a>, but this is only an implementation detail.)</p> <p>The last piece of the puzzle is the generated code for the <code>example</code> function itself. Remember, the function header was defined like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#569cd6;">fn </span><span>example(min_len: </span><span style="color:#569cd6;">usize</span><span>) -&gt; String </span></code></pre> <p>Since the complete function body is now implemented by the state machine, the only thing that the function needs to do is to initialize the state machine and return it. The generated code for this could look like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>example(min_len: </span><span style="color:#569cd6;">usize</span><span>) -&gt; ExampleStateMachine { </span><span> ExampleStateMachine::Start(StartState { </span><span> min_len, </span><span> }) </span><span>} </span></code></pre> <p>The function no longer has an <code>async</code> modifier since it now explicitly returns an <code>ExampleStateMachine</code> type, which implements the <code>Future</code> trait. As expected, the state machine is constructed in the <code>Start</code> state and the corresponding state struct is initialized with the <code>min_len</code> parameter.</p> <p>Note that this function does not start the execution of the state machine. This is a fundamental design decision of futures in Rust: they do nothing until they are polled for the first time.</p> <h3 id="pinning"><a class="zola-anchor" href="#pinning" aria-label="Anchor link for: pinning">🔗</a>Pinning</h3> <p>We already stumbled across <em>pinning</em> multiple times in this post. Now is finally the time to explore what pinning is and why it is needed.</p> <h4 id="self-referential-structs"><a class="zola-anchor" href="#self-referential-structs" aria-label="Anchor link for: self-referential-structs">🔗</a>Self-Referential Structs</h4> <p>As explained above, the state machine transformation stores the local variables of each pause point in a struct. For small examples like our <code>example</code> function, this was straightforward and did not lead to any problems. However, things become more difficult when variables reference each other. For example, consider this function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#569cd6;">fn </span><span>pin_example() -&gt; </span><span style="color:#569cd6;">i32 </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> array = [</span><span style="color:#b5cea8;">1</span><span>, </span><span style="color:#b5cea8;">2</span><span>, </span><span style="color:#b5cea8;">3</span><span>]; </span><span> </span><span style="color:#569cd6;">let</span><span> element = </span><span style="color:#569cd6;">&amp;</span><span>array[</span><span style="color:#b5cea8;">2</span><span>]; </span><span> async_write_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>, element.to_string()).await; </span><span> *element </span><span>} </span></code></pre> <p>This function creates a small <code>array</code> with the contents <code>1</code>, <code>2</code>, and <code>3</code>. It then creates a reference to the last array element and stores it in an <code>element</code> variable. Next, it asynchronously writes the number converted to a string to a <code>foo.txt</code> file. Finally, it returns the number referenced by <code>element</code>.</p> <p>Since the function uses a single <code>await</code> operation, the resulting state machine has three states: start, end, and “waiting on write”. The function takes no arguments, so the struct for the start state is empty. Like before, the struct for the end state is empty because the function is finished at this point. The struct for the “waiting on write” state is more interesting:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">struct </span><span>WaitingOnWriteState { </span><span> array: [1, 2, 3], </span><span> element: 0x1001c, </span><span style="color:#608b4e;">// address of the last array element </span><span>} </span></code></pre> <p>We need to store both the <code>array</code> and <code>element</code> variables because <code>element</code> is required for the return value and <code>array</code> is referenced by <code>element</code>. Since <code>element</code> is a reference, it stores a <em>pointer</em> (i.e., a memory address) to the referenced element. We used <code>0x1001c</code> as an example memory address here. In reality, it needs to be the address of the last element of the <code>array</code> field, so it depends on where the struct lives in memory. Structs with such internal pointers are called <em>self-referential</em> structs because they reference themselves from one of their fields.</p> <h4 id="the-problem-with-self-referential-structs"><a class="zola-anchor" href="#the-problem-with-self-referential-structs" aria-label="Anchor link for: the-problem-with-self-referential-structs">🔗</a>The Problem with Self-Referential Structs</h4> <p>The internal pointer of our self-referential struct leads to a fundamental problem, which becomes apparent when we look at its memory layout:</p> <p><img src="https://os.phil-opp.com/async-await/self-referential-struct.svg" alt="array at 0x10014 with fields 1, 2, and 3; element at address 0x10020, pointing to the last array element at 0x1001c" /></p> <p>The <code>array</code> field starts at address 0x10014 and the <code>element</code> field at address 0x10020. It points to address 0x1001c because the last array element lives at this address. At this point, everything is still fine. However, an issue occurs when we move this struct to a different memory address:</p> <p><img src="https://os.phil-opp.com/async-await/self-referential-struct-moved.svg" alt="array at 0x10024 with fields 1, 2, and 3; element at address 0x10030, still pointing to 0x1001c, even though the last array element now lives at 0x1002c" /></p> <p>We moved the struct a bit so that it starts at address <code>0x10024</code> now. This could, for example, happen when we pass the struct as a function argument or assign it to a different stack variable. The problem is that the <code>element</code> field still points to address <code>0x1001c</code> even though the last <code>array</code> element now lives at address <code>0x1002c</code>. Thus, the pointer is dangling, with the result that undefined behavior occurs on the next <code>poll</code> call.</p> <h4 id="possible-solutions"><a class="zola-anchor" href="#possible-solutions" aria-label="Anchor link for: possible-solutions">🔗</a>Possible Solutions</h4> <p>There are three fundamental approaches to solving the dangling pointer problem:</p> <ul> <li> <p><strong>Update the pointer on move:</strong> The idea is to update the internal pointer whenever the struct is moved in memory so that it is still valid after the move. Unfortunately, this approach would require extensive changes to Rust that would result in potentially huge performance losses. The reason is that some kind of runtime would need to keep track of the type of all struct fields and check on every move operation whether a pointer update is required.</p> </li> <li> <p><strong>Store an offset instead of self-references:</strong>: To avoid the requirement for updating pointers, the compiler could try to store self-references as offsets from the struct’s beginning instead. For example, the <code>element</code> field of the above <code>WaitingOnWriteState</code> struct could be stored in the form of an <code>element_offset</code> field with a value of 8 because the array element that the reference points to starts 8 bytes after the struct’s beginning. Since the offset stays the same when the struct is moved, no field updates are required.</p> <p>The problem with this approach is that it requires the compiler to detect all self-references. This is not possible at compile-time because the value of a reference might depend on user input, so we would need a runtime system again to analyze references and correctly create the state structs. This would not only result in runtime costs but also prevent certain compiler optimizations, so that it would cause large performance losses again.</p> </li> <li> <p><strong>Forbid moving the struct:</strong> As we saw above, the dangling pointer only occurs when we move the struct in memory. By completely forbidding move operations on self-referential structs, the problem can also be avoided. The big advantage of this approach is that it can be implemented at the type system level without additional runtime costs. The drawback is that it puts the burden of dealing with move operations on possibly self-referential structs on the programmer.</p> </li> </ul> <p>Rust chose the third solution because of its principle of providing <em>zero cost abstractions</em>, which means that abstractions should not impose additional runtime costs. The <a href="https://doc.rust-lang.org/stable/core/pin/index.html"><em>pinning</em></a> API was proposed for this purpose in <a href="https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md">RFC 2349</a>. In the following, we will give a short overview of this API and explain how it works with async/await and futures.</p> <h4 id="heap-values"><a class="zola-anchor" href="#heap-values" aria-label="Anchor link for: heap-values">🔗</a>Heap Values</h4> <p>The first observation is that <a href="https://os.phil-opp.com/heap-allocation/">heap-allocated</a> values already have a fixed memory address most of the time. They are created using a call to <code>allocate</code> and then referenced by a pointer type such as <code>Box&lt;T&gt;</code>. While moving the pointer type is possible, the heap value that the pointer points to stays at the same memory address until it is freed through a <code>deallocate</code> call again.</p> <p>Using heap allocation, we can try to create a self-referential struct:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>main() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> heap_value = Box::new(SelfReferential { </span><span> self_ptr: </span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">as *const _</span><span>, </span><span> }); </span><span> </span><span style="color:#569cd6;">let</span><span> ptr = </span><span style="color:#569cd6;">&amp;</span><span>*heap_value </span><span style="color:#569cd6;">as *const</span><span> SelfReferential; </span><span> heap_value.self_ptr = ptr; </span><span> println!(</span><span style="color:#d69d85;">&quot;heap value at: </span><span style="color:#b4cea8;">{:p}</span><span style="color:#d69d85;">&quot;</span><span>, heap_value); </span><span> println!(</span><span style="color:#d69d85;">&quot;internal reference: </span><span style="color:#b4cea8;">{:p}</span><span style="color:#d69d85;">&quot;</span><span>, heap_value.self_ptr); </span><span>} </span><span> </span><span style="color:#569cd6;">struct </span><span>SelfReferential { </span><span> self_ptr: </span><span style="color:#569cd6;">*const Self</span><span>, </span><span>} </span></code></pre> <p>(<a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=ce1aff3a37fcc1c8188eeaf0f39c97e8">Try it on the playground</a>)</p> <p>We create a simple struct named <code>SelfReferential</code> that contains a single pointer field. First, we initialize this struct with a null pointer and then allocate it on the heap using <code>Box::new</code>. We then determine the memory address of the heap-allocated struct and store it in a <code>ptr</code> variable. Finally, we make the struct self-referential by assigning the <code>ptr</code> variable to the <code>self_ptr</code> field.</p> <p>When we execute this code <a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=ce1aff3a37fcc1c8188eeaf0f39c97e8">on the playground</a>, we see that the address of the heap value and its internal pointer are equal, which means that the <code>self_ptr</code> field is a valid self-reference. Since the <code>heap_value</code> variable is only a pointer, moving it (e.g., by passing it to a function) does not change the address of the struct itself, so the <code>self_ptr</code> stays valid even if the pointer is moved.</p> <p>However, there is still a way to break this example: We can move out of a <code>Box&lt;T&gt;</code> or replace its content:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> stack_value = mem::replace(</span><span style="color:#569cd6;">&amp;mut </span><span>*heap_value, SelfReferential { </span><span> self_ptr: </span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">as *const _</span><span>, </span><span>}); </span><span>println!(</span><span style="color:#d69d85;">&quot;value at: </span><span style="color:#b4cea8;">{:p}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#569cd6;">&amp;</span><span>stack_value); </span><span>println!(</span><span style="color:#d69d85;">&quot;internal reference: </span><span style="color:#b4cea8;">{:p}</span><span style="color:#d69d85;">&quot;</span><span>, stack_value.self_ptr); </span></code></pre> <p>(<a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=e160ee8a64cba4cebc1c0473dcecb7c8">Try it on the playground</a>)</p> <p>Here we use the <a href="https://doc.rust-lang.org/nightly/core/mem/fn.replace.html"><code>mem::replace</code></a> function to replace the heap-allocated value with a new struct instance. This allows us to move the original <code>heap_value</code> to the stack, while the <code>self_ptr</code> field of the struct is now a dangling pointer that still points to the old heap address. When you try to run the example on the playground, you see that the printed <em>“value at:”</em> and <em>“internal reference:”</em> lines indeed show different pointers. So heap allocating a value is not enough to make self-references safe.</p> <p>The fundamental problem that allowed the above breakage is that <code>Box&lt;T&gt;</code> allows us to get a <code>&amp;mut T</code> reference to the heap-allocated value. This <code>&amp;mut</code> reference makes it possible to use methods like <a href="https://doc.rust-lang.org/nightly/core/mem/fn.replace.html"><code>mem::replace</code></a> or <a href="https://doc.rust-lang.org/nightly/core/mem/fn.swap.html"><code>mem::swap</code></a> to invalidate the heap-allocated value. To resolve this problem, we must prevent <code>&amp;mut</code> references to self-referential structs from being created.</p> <h4 id="pin-box-t-and-unpin"><a class="zola-anchor" href="#pin-box-t-and-unpin" aria-label="Anchor link for: pin-box-t-and-unpin">🔗</a><code>Pin&lt;Box&lt;T&gt;&gt;</code> and <code>Unpin</code></h4> <p>The pinning API provides a solution to the <code>&amp;mut T</code> problem in the form of the <a href="https://doc.rust-lang.org/stable/core/pin/struct.Pin.html"><code>Pin</code></a> wrapper type and the <a href="https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html"><code>Unpin</code></a> marker trait. The idea behind these types is to gate all methods of <code>Pin</code> that can be used to get <code>&amp;mut</code> references to the wrapped value (e.g. <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut"><code>get_mut</code></a> or <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut"><code>deref_mut</code></a>) on the <code>Unpin</code> trait. The <code>Unpin</code> trait is an <a href="https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits"><em>auto trait</em></a>, which is automatically implemented for all types except those that explicitly opt-out. By making self-referential structs opt-out of <code>Unpin</code>, there is no (safe) way to get a <code>&amp;mut T</code> from a <code>Pin&lt;Box&lt;T&gt;&gt;</code> type for them. As a result, their internal self-references are guaranteed to stay valid.</p> <p>As an example, let’s update the <code>SelfReferential</code> type from above to opt-out of <code>Unpin</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>core::marker::PhantomPinned; </span><span> </span><span style="color:#569cd6;">struct </span><span>SelfReferential { </span><span> self_ptr: </span><span style="color:#569cd6;">*const Self</span><span>, </span><span> _pin: PhantomPinned, </span><span>} </span></code></pre> <p>We opt-out by adding a second <code>_pin</code> field of type <a href="https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html"><code>PhantomPinned</code></a>. This type is a zero-sized marker type whose only purpose is to <em>not</em> implement the <code>Unpin</code> trait. Because of the way <a href="https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits">auto traits</a> work, a single field that is not <code>Unpin</code> suffices to make the complete struct opt-out of <code>Unpin</code>.</p> <p>The second step is to change the <code>Box&lt;SelfReferential&gt;</code> type in the example to a <code>Pin&lt;Box&lt;SelfReferential&gt;&gt;</code> type. The easiest way to do this is to use the <a href="https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin"><code>Box::pin</code></a> function instead of <a href="https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new"><code>Box::new</code></a> for creating the heap-allocated value:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let mut</span><span> heap_value = Box::pin(SelfReferential { </span><span> self_ptr: </span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">as *const _</span><span>, </span><span> _pin: PhantomPinned, </span><span>}); </span></code></pre> <p>In addition to changing <code>Box::new</code> to <code>Box::pin</code>, we also need to add the new <code>_pin</code> field in the struct initializer. Since <code>PhantomPinned</code> is a zero-sized type, we only need its type name to initialize it.</p> <p>When we <a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=961b0db194bbe851ff4d0ed08d3bd98a">try to run our adjusted example</a> now, we see that it no longer works:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0594]: cannot assign to data in a dereference of `std::pin::Pin&lt;std::boxed::Box&lt;SelfReferential&gt;&gt;` </span><span> --&gt; src/main.rs:10:5 </span><span> | </span><span>10 | heap_value.self_ptr = ptr; </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign </span><span> | </span><span> = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `std::pin::Pin&lt;std::boxed::Box&lt;SelfReferential&gt;&gt;` </span><span> </span><span>error[E0596]: cannot borrow data in a dereference of `std::pin::Pin&lt;std::boxed::Box&lt;SelfReferential&gt;&gt;` as mutable </span><span> --&gt; src/main.rs:16:36 </span><span> | </span><span>16 | let stack_value = mem::replace(&amp;mut *heap_value, SelfReferential { </span><span> | ^^^^^^^^^^^^^^^^ cannot borrow as mutable </span><span> | </span><span> = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `std::pin::Pin&lt;std::boxed::Box&lt;SelfReferential&gt;&gt;` </span></code></pre> <p>Both errors occur because the <code>Pin&lt;Box&lt;SelfReferential&gt;&gt;</code> type no longer implements the <code>DerefMut</code> trait. This is exactly what we wanted because the <code>DerefMut</code> trait would return a <code>&amp;mut</code> reference, which we wanted to prevent. This only happens because we both opted-out of <code>Unpin</code> and changed <code>Box::new</code> to <code>Box::pin</code>.</p> <p>The problem now is that the compiler does not only prevent moving the type in line 16, but also forbids initializing the <code>self_ptr</code> field in line 10. This happens because the compiler can’t differentiate between valid and invalid uses of <code>&amp;mut</code> references. To get the initialization working again, we have to use the unsafe <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut"><code>get_unchecked_mut</code></a> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// safe because modifying a field doesn&#39;t move the whole struct </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> mut_ref = Pin::as_mut(</span><span style="color:#569cd6;">&amp;mut</span><span> heap_value); </span><span> Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; </span><span>} </span></code></pre> <p>(<a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=b9ebbb11429d9d79b3f9fffe819e2018">Try it on the playground</a>)</p> <p>The <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut"><code>get_unchecked_mut</code></a> function works on a <code>Pin&lt;&amp;mut T&gt;</code> instead of a <code>Pin&lt;Box&lt;T&gt;&gt;</code>, so we have to use <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut"><code>Pin::as_mut</code></a> for converting the value. Then we can set the <code>self_ptr</code> field using the <code>&amp;mut</code> reference returned by <code>get_unchecked_mut</code>.</p> <p>Now the only error left is the desired error on <code>mem::replace</code>. Remember, this operation tries to move the heap-allocated value to the stack, which would break the self-reference stored in the <code>self_ptr</code> field. By opting out of <code>Unpin</code> and using <code>Pin&lt;Box&lt;T&gt;&gt;</code>, we can prevent this operation at compile time and thus safely work with self-referential structs. As we saw, the compiler is not able to prove that the creation of the self-reference is safe (yet), so we need to use an unsafe block and verify the correctness ourselves.</p> <h4 id="stack-pinning-and-pin-mut-t"><a class="zola-anchor" href="#stack-pinning-and-pin-mut-t" aria-label="Anchor link for: stack-pinning-and-pin-mut-t">🔗</a>Stack Pinning and <code>Pin&lt;&amp;mut T&gt;</code></h4> <p>In the previous section, we learned how to use <code>Pin&lt;Box&lt;T&gt;&gt;</code> to safely create a heap-allocated self-referential value. While this approach works fine and is relatively safe (apart from the unsafe construction), the required heap allocation comes with a performance cost. Since Rust strives to provide <em>zero-cost abstractions</em> whenever possible, the pinning API also allows to create <code>Pin&lt;&amp;mut T&gt;</code> instances that point to stack-allocated values.</p> <p>Unlike <code>Pin&lt;Box&lt;T&gt;&gt;</code> instances, which have <em>ownership</em> of the wrapped value, <code>Pin&lt;&amp;mut T&gt;</code> instances only temporarily borrow the wrapped value. This makes things more complicated, as it requires the programmer to ensure additional guarantees themselves. Most importantly, a <code>Pin&lt;&amp;mut T&gt;</code> must stay pinned for the whole lifetime of the referenced <code>T</code>, which can be difficult to verify for stack-based variables. To help with this, crates like <a href="https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/"><code>pin-utils</code></a> exist, but I still wouldn’t recommend pinning to the stack unless you really know what you’re doing.</p> <p>For further reading, check out the documentation of the <a href="https://doc.rust-lang.org/nightly/core/pin/index.html"><code>pin</code> module</a> and the <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked"><code>Pin::new_unchecked</code></a> method.</p> <h4 id="pinning-and-futures"><a class="zola-anchor" href="#pinning-and-futures" aria-label="Anchor link for: pinning-and-futures">🔗</a>Pinning and Futures</h4> <p>As we already saw in this post, the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll"><code>Future::poll</code></a> method uses pinning in the form of a <code>Pin&lt;&amp;mut Self&gt;</code> parameter:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>poll(self: Pin&lt;</span><span style="color:#569cd6;">&amp;mut Self</span><span>&gt;, cx: </span><span style="color:#569cd6;">&amp;mut</span><span> Context) -&gt; Poll&lt;</span><span style="color:#569cd6;">Self::</span><span>Output&gt; </span></code></pre> <p>The reason that this method takes <code>self: Pin&lt;&amp;mut Self&gt;</code> instead of the normal <code>&amp;mut self</code> is that future instances created from async/await are often self-referential, as we saw <a href="https://os.phil-opp.com/async-await/#self-referential-structs">above</a>. By wrapping <code>Self</code> into <code>Pin</code> and letting the compiler opt-out of <code>Unpin</code> for self-referential futures generated from async/await, it is guaranteed that the futures are not moved in memory between <code>poll</code> calls. This ensures that all internal references are still valid.</p> <p>It is worth noting that moving futures before the first <code>poll</code> call is fine. This is a result of the fact that futures are lazy and do nothing until they’re polled for the first time. The <code>start</code> state of the generated state machines therefore only contains the function arguments but no internal references. In order to call <code>poll</code>, the caller must wrap the future into <code>Pin</code> first, which ensures that the future cannot be moved in memory anymore. Since stack pinning is more difficult to get right, I recommend to always use <a href="https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin"><code>Box::pin</code></a> combined with <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut"><code>Pin::as_mut</code></a> for this.</p> <p>In case you’re interested in understanding how to safely implement a future combinator function using stack pinning yourself, take a look at the relatively short <a href="https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html">source of the <code>map</code> combinator method</a> of the <code>futures</code> crate and the section about <a href="https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning">projections and structural pinning</a> of the pin documentation.</p> <h3 id="executors-and-wakers"><a class="zola-anchor" href="#executors-and-wakers" aria-label="Anchor link for: executors-and-wakers">🔗</a>Executors and Wakers</h3> <p>Using async/await, it is possible to ergonomically work with futures in a completely asynchronous way. However, as we learned above, futures do nothing until they are polled. This means we have to call <code>poll</code> on them at some point, otherwise the asynchronous code is never executed.</p> <p>With a single future, we can always wait for each future manually using a loop <a href="https://os.phil-opp.com/async-await/#waiting-on-futures">as described above</a>. However, this approach is very inefficient and not practical for programs that create a large number of futures. The most common solution to this problem is to define a global <em>executor</em> that is responsible for polling all futures in the system until they are finished.</p> <h4 id="executors"><a class="zola-anchor" href="#executors" aria-label="Anchor link for: executors">🔗</a>Executors</h4> <p>The purpose of an executor is to allow spawning futures as independent tasks, typically through some sort of <code>spawn</code> method. The executor is then responsible for polling all futures until they are completed. The big advantage of managing all futures in a central place is that the executor can switch to a different future whenever a future returns <code>Poll::Pending</code>. Thus, asynchronous operations are run in parallel and the CPU is kept busy.</p> <p>Many executor implementations can also take advantage of systems with multiple CPU cores. They create a <a href="https://en.wikipedia.org/wiki/Thread_pool">thread pool</a> that is able to utilize all cores if there is enough work available and use techniques such as <a href="https://en.wikipedia.org/wiki/Work_stealing">work stealing</a> to balance the load between cores. There are also special executor implementations for embedded systems that optimize for low latency and memory overhead.</p> <p>To avoid the overhead of polling futures repeatedly, executors typically take advantage of the <em>waker</em> API supported by Rust’s futures.</p> <h4 id="wakers"><a class="zola-anchor" href="#wakers" aria-label="Anchor link for: wakers">🔗</a>Wakers</h4> <p>The idea behind the waker API is that a special <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a> type is passed to each invocation of <code>poll</code>, wrapped in the <a href="https://doc.rust-lang.org/nightly/core/task/struct.Context.html"><code>Context</code></a> type. This <code>Waker</code> type is created by the executor and can be used by the asynchronous task to signal its (partial) completion. As a result, the executor does not need to call <code>poll</code> on a future that previously returned <code>Poll::Pending</code> until it is notified by the corresponding waker.</p> <p>This is best illustrated by a small example:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#569cd6;">fn </span><span>write_file() { </span><span> async_write_file(</span><span style="color:#d69d85;">&quot;foo.txt&quot;</span><span>, </span><span style="color:#d69d85;">&quot;Hello&quot;</span><span>).await; </span><span>} </span></code></pre> <p>This function asynchronously writes the string “Hello” to a <code>foo.txt</code> file. Since hard disk writes take some time, the first <code>poll</code> call on this future will likely return <code>Poll::Pending</code>. However, the hard disk driver will internally store the <code>Waker</code> passed to the <code>poll</code> call and use it to notify the executor when the file is written to disk. This way, the executor does not need to waste any time trying to <code>poll</code> the future again before it receives the waker notification.</p> <p>We will see how the <code>Waker</code> type works in detail when we create our own executor with waker support in the implementation section of this post.</p> <h3 id="cooperative-multitasking-1"><a class="zola-anchor" href="#cooperative-multitasking-1" aria-label="Anchor link for: cooperative-multitasking-1">🔗</a>Cooperative Multitasking?</h3> <p>At the beginning of this post, we talked about preemptive and cooperative multitasking. While preemptive multitasking relies on the operating system to forcibly switch between running tasks, cooperative multitasking requires that the tasks voluntarily give up control of the CPU through a <em>yield</em> operation on a regular basis. The big advantage of the cooperative approach is that tasks can save their state themselves, which results in more efficient context switches and makes it possible to share the same call stack between tasks.</p> <p>It might not be immediately apparent, but futures and async/await are an implementation of the cooperative multitasking pattern:</p> <ul> <li>Each future that is added to the executor is basically a cooperative task.</li> <li>Instead of using an explicit yield operation, futures give up control of the CPU core by returning <code>Poll::Pending</code> (or <code>Poll::Ready</code> at the end). <ul> <li>There is nothing that forces futures to give up the CPU. If they want, they can never return from <code>poll</code>, e.g., by spinning endlessly in a loop.</li> <li>Since each future can block the execution of the other futures in the executor, we need to trust them to not be malicious.</li> </ul> </li> <li>Futures internally store all the state they need to continue execution on the next <code>poll</code> call. With async/await, the compiler automatically detects all variables that are needed and stores them inside the generated state machine. <ul> <li>Only the minimum state required for continuation is saved.</li> <li>Since the <code>poll</code> method gives up the call stack when it returns, the same stack can be used for polling other futures.</li> </ul> </li> </ul> <p>We see that futures and async/await fit the cooperative multitasking pattern perfectly; they just use some different terminology. In the following, we will therefore use the terms “task” and “future” interchangeably.</p> <h2 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h2> <p>Now that we understand how cooperative multitasking based on futures and async/await works in Rust, it’s time to add support for it to our kernel. Since the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html"><code>Future</code></a> trait is part of the <code>core</code> library and async/await is a feature of the language itself, there is nothing special we need to do to use it in our <code>#![no_std]</code> kernel. The only requirement is that we use at least nightly <code>2020-03-25</code> of Rust because async/await was not <code>no_std</code> compatible before.</p> <p>With a recent-enough nightly, we can start using async/await in our <code>main.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>async </span><span style="color:#569cd6;">fn </span><span>async_number() -&gt; </span><span style="color:#569cd6;">u32 </span><span>{ </span><span> </span><span style="color:#b5cea8;">42 </span><span>} </span><span> </span><span>async </span><span style="color:#569cd6;">fn </span><span>example_task() { </span><span> </span><span style="color:#569cd6;">let</span><span> number = async_number().await; </span><span> println!(</span><span style="color:#d69d85;">&quot;async number: </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, number); </span><span>} </span></code></pre> <p>The <code>async_number</code> function is an <code>async fn</code>, so the compiler transforms it into a state machine that implements <code>Future</code>. Since the function only returns <code>42</code>, the resulting future will directly return <code>Poll::Ready(42)</code> on the first <code>poll</code> call. Like <code>async_number</code>, the <code>example_task</code> function is also an <code>async fn</code>. It awaits the number returned by <code>async_number</code> and then prints it using the <code>println</code> macro.</p> <p>To run the future returned by <code>example_task</code>, we need to call <code>poll</code> on it until it signals its completion by returning <code>Poll::Ready</code>. To do this, we need to create a simple executor type.</p> <h3 id="task"><a class="zola-anchor" href="#task" aria-label="Anchor link for: task">🔗</a>Task</h3> <p>Before we start the executor implementation, we create a new <code>task</code> module with a <code>Task</code> type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>task; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::{future::Future, pin::Pin}; </span><span style="color:#569cd6;">use </span><span>alloc::boxed::Box; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Task { </span><span> future: Pin&lt;Box&lt;dyn Future&lt;Output = ()&gt;&gt;&gt;, </span><span>} </span></code></pre> <p>The <code>Task</code> struct is a newtype wrapper around a pinned, heap-allocated, and dynamically dispatched future with the empty type <code>()</code> as output. Let’s go through it in detail:</p> <ul> <li>We require that the future associated with a task returns <code>()</code>. This means that tasks don’t return any result, they are just executed for their side effects. For example, the <code>example_task</code> function we defined above has no return value, but it prints something to the screen as a side effect.</li> <li>The <code>dyn</code> keyword indicates that we store a <a href="https://doc.rust-lang.org/book/ch17-02-trait-objects.html"><em>trait object</em></a> in the <code>Box</code>. This means that the methods on the future are <a href="https://doc.rust-lang.org/book/ch17-02-trait-objects.html#trait-objects-perform-dynamic-dispatch"><em>dynamically dispatched</em></a>, allowing different types of futures to be stored in the <code>Task</code> type. This is important because each <code>async fn</code> has its own type and we want to be able to create multiple different tasks.</li> <li>As we learned in the <a href="https://os.phil-opp.com/async-await/#pinning">section about pinning</a>, the <code>Pin&lt;Box&gt;</code> type ensures that a value cannot be moved in memory by placing it on the heap and preventing the creation of <code>&amp;mut</code> references to it. This is important because futures generated by async/await might be self-referential, i.e., contain pointers to themselves that would be invalidated when the future is moved.</li> </ul> <p>To allow the creation of new <code>Task</code> structs from futures, we create a <code>new</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Task { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new(future: impl Future&lt;Output = ()&gt; + </span><span style="color:#569cd6;">&#39;static</span><span>) -&gt; Task { </span><span> Task { </span><span> future: Box::pin(future), </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The function takes an arbitrary future with an output type of <code>()</code> and pins it in memory through the <a href="https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin"><code>Box::pin</code></a> function. Then it wraps the boxed future in the <code>Task</code> struct and returns it. The <code>'static</code> lifetime is required here because the returned <code>Task</code> can live for an arbitrary time, so the future needs to be valid for that time too.</p> <p>We also add a <code>poll</code> method to allow the executor to poll the stored future:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::task::{Context, Poll}; </span><span> </span><span style="color:#569cd6;">impl </span><span>Task { </span><span> </span><span style="color:#569cd6;">fn </span><span>poll(</span><span style="color:#569cd6;">&amp;mut </span><span>self, context: </span><span style="color:#569cd6;">&amp;mut</span><span> Context) -&gt; Poll&lt;()&gt; { </span><span> self.future.as_mut().poll(context) </span><span> } </span><span>} </span></code></pre> <p>Since the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll"><code>poll</code></a> method of the <code>Future</code> trait expects to be called on a <code>Pin&lt;&amp;mut T&gt;</code> type, we use the <a href="https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut"><code>Pin::as_mut</code></a> method to convert the <code>self.future</code> field of type <code>Pin&lt;Box&lt;T&gt;&gt;</code> first. Then we call <code>poll</code> on the converted <code>self.future</code> field and return the result. Since the <code>Task::poll</code> method should only be called by the executor that we’ll create in a moment, we keep the function private to the <code>task</code> module.</p> <h3 id="simple-executor"><a class="zola-anchor" href="#simple-executor" aria-label="Anchor link for: simple-executor">🔗</a>Simple Executor</h3> <p>Since executors can be quite complex, we deliberately start by creating a very basic executor before implementing a more featureful executor later. For this, we first create a new <code>task::simple_executor</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>simple_executor; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/simple_executor.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::Task; </span><span style="color:#569cd6;">use </span><span>alloc::collections::VecDeque; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>SimpleExecutor { </span><span> task_queue: VecDeque&lt;Task&gt;, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>SimpleExecutor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new() -&gt; SimpleExecutor { </span><span> SimpleExecutor { </span><span> task_queue: VecDeque::new(), </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>spawn(</span><span style="color:#569cd6;">&amp;mut </span><span>self, task: Task) { </span><span> self.task_queue.push_back(task) </span><span> } </span><span>} </span></code></pre> <p>The struct contains a single <code>task_queue</code> field of type <a href="https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html"><code>VecDeque</code></a>, which is basically a vector that allows for push and pop operations on both ends. The idea behind using this type is that we insert new tasks through the <code>spawn</code> method at the end and pop the next task for execution from the front. This way, we get a simple <a href="https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)">FIFO queue</a> (<em>“first in, first out”</em>).</p> <h4 id="dummy-waker"><a class="zola-anchor" href="#dummy-waker" aria-label="Anchor link for: dummy-waker">🔗</a>Dummy Waker</h4> <p>In order to call the <code>poll</code> method, we need to create a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Context.html"><code>Context</code></a> type, which wraps a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a> type. To start simple, we will first create a dummy waker that does nothing. For this, we create a <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html"><code>RawWaker</code></a> instance, which defines the implementation of the different <code>Waker</code> methods, and then use the <a href="https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw"><code>Waker::from_raw</code></a> function to turn it into a <code>Waker</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/simple_executor.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::task::{Waker, RawWaker}; </span><span> </span><span style="color:#569cd6;">fn </span><span>dummy_raw_waker() -&gt; RawWaker { </span><span> todo!(); </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>dummy_waker() -&gt; Waker { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ Waker::from_raw(dummy_raw_waker()) } </span><span>} </span></code></pre> <p>The <code>from_raw</code> function is unsafe because undefined behavior can occur if the programmer does not uphold the documented requirements of <code>RawWaker</code>. Before we look at the implementation of the <code>dummy_raw_waker</code> function, we first try to understand how the <code>RawWaker</code> type works.</p> <h5 id="rawwaker"><a class="zola-anchor" href="#rawwaker" aria-label="Anchor link for: rawwaker">🔗</a><code>RawWaker</code></h5> <p>The <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html"><code>RawWaker</code></a> type requires the programmer to explicitly define a <a href="https://en.wikipedia.org/wiki/Virtual_method_table"><em>virtual method table</em></a> (<em>vtable</em>) that specifies the functions that should be called when the <code>RawWaker</code> is cloned, woken, or dropped. The layout of this vtable is defined by the <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html"><code>RawWakerVTable</code></a> type. Each function receives a <code>*const ()</code> argument, which is a <em>type-erased</em> pointer to some value. The reason for using a <code>*const ()</code> pointer instead of a proper reference is that the <code>RawWaker</code> type should be non-generic but still support arbitrary types. The pointer is provided by putting it into the <code>data</code> argument of <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new"><code>RawWaker::new</code></a>, which just initializes a <code>RawWaker</code>. The <code>Waker</code> then uses this <code>RawWaker</code> to call the vtable functions with <code>data</code>.</p> <p>Typically, the <code>RawWaker</code> is created for some heap-allocated struct that is wrapped into the <a href="https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html"><code>Box</code></a> or <a href="https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html"><code>Arc</code></a> type. For such types, methods like <a href="https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw"><code>Box::into_raw</code></a> can be used to convert the <code>Box&lt;T&gt;</code> to a <code>*const T</code> pointer. This pointer can then be cast to an anonymous <code>*const ()</code> pointer and passed to <code>RawWaker::new</code>. Since each vtable function receives the same <code>*const ()</code> as an argument, the functions can safely cast the pointer back to a <code>Box&lt;T&gt;</code> or a <code>&amp;T</code> to operate on it. As you can imagine, this process is highly dangerous and can easily lead to undefined behavior on mistakes. For this reason, manually creating a <code>RawWaker</code> is not recommended unless necessary.</p> <h5 id="a-dummy-rawwaker"><a class="zola-anchor" href="#a-dummy-rawwaker" aria-label="Anchor link for: a-dummy-rawwaker">🔗</a>A Dummy <code>RawWaker</code></h5> <p>While manually creating a <code>RawWaker</code> is not recommended, there is currently no other way to create a dummy <code>Waker</code> that does nothing. Fortunately, the fact that we want to do nothing makes it relatively safe to implement the <code>dummy_raw_waker</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/simple_executor.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::task::RawWakerVTable; </span><span> </span><span style="color:#569cd6;">fn </span><span>dummy_raw_waker() -&gt; RawWaker { </span><span> </span><span style="color:#569cd6;">fn </span><span>no_op(</span><span style="color:#569cd6;">_</span><span>: </span><span style="color:#569cd6;">*const </span><span>()) {} </span><span> </span><span style="color:#569cd6;">fn </span><span>clone(</span><span style="color:#569cd6;">_</span><span>: </span><span style="color:#569cd6;">*const </span><span>()) -&gt; RawWaker { </span><span> dummy_raw_waker() </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> vtable = </span><span style="color:#569cd6;">&amp;</span><span>RawWakerVTable::new(clone, no_op, no_op, no_op); </span><span> RawWaker::new(</span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">as *const </span><span>(), vtable) </span><span>} </span></code></pre> <p>First, we define two inner functions named <code>no_op</code> and <code>clone</code>. The <code>no_op</code> function takes a <code>*const ()</code> pointer and does nothing. The <code>clone</code> function also takes a <code>*const ()</code> pointer and returns a new <code>RawWaker</code> by calling <code>dummy_raw_waker</code> again. We use these two functions to create a minimal <code>RawWakerVTable</code>: The <code>clone</code> function is used for the cloning operations, and the <code>no_op</code> function is used for all other operations. Since the <code>RawWaker</code> does nothing, it does not matter that we return a new <code>RawWaker</code> from <code>clone</code> instead of cloning it.</p> <p>After creating the <code>vtable</code>, we use the <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new"><code>RawWaker::new</code></a> function to create the <code>RawWaker</code>. The passed <code>*const ()</code> does not matter since none of the vtable functions use it. For this reason, we simply pass a null pointer.</p> <h4 id="a-run-method"><a class="zola-anchor" href="#a-run-method" aria-label="Anchor link for: a-run-method">🔗</a>A <code>run</code> Method</h4> <p>Now we have a way to create a <code>Waker</code> instance, we can use it to implement a <code>run</code> method on our executor. The most simple <code>run</code> method is to repeatedly poll all queued tasks in a loop until all are done. This is not very efficient since it does not utilize the notifications of the <code>Waker</code> type, but it is an easy way to get things running:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/simple_executor.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::task::{Context, Poll}; </span><span> </span><span style="color:#569cd6;">impl </span><span>SimpleExecutor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>run(</span><span style="color:#569cd6;">&amp;mut </span><span>self) { </span><span> </span><span style="color:#569cd6;">while let </span><span>Some(</span><span style="color:#569cd6;">mut</span><span> task) = self.task_queue.pop_front() { </span><span> </span><span style="color:#569cd6;">let</span><span> waker = dummy_waker(); </span><span> </span><span style="color:#569cd6;">let mut</span><span> context = Context::from_waker(</span><span style="color:#569cd6;">&amp;</span><span>waker); </span><span> </span><span style="color:#569cd6;">match</span><span> task.poll(</span><span style="color:#569cd6;">&amp;mut</span><span> context) { </span><span> Poll::Ready(()) </span><span style="color:#569cd6;">=&gt; </span><span>{} </span><span style="color:#608b4e;">// task done </span><span> Poll::Pending </span><span style="color:#569cd6;">=&gt; </span><span>self.task_queue.push_back(task), </span><span> } </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The function uses a <code>while let</code> loop to handle all tasks in the <code>task_queue</code>. For each task, it first creates a <code>Context</code> type by wrapping a <code>Waker</code> instance returned by our <code>dummy_waker</code> function. Then it invokes the <code>Task::poll</code> method with this <code>context</code>. If the <code>poll</code> method returns <code>Poll::Ready</code>, the task is finished and we can continue with the next task. If the task is still <code>Poll::Pending</code>, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration.</p> <h4 id="trying-it"><a class="zola-anchor" href="#trying-it" aria-label="Anchor link for: trying-it">🔗</a>Trying It</h4> <p>With our <code>SimpleExecutor</code> type, we can now try running the task returned by the <code>example_task</code> function in our <code>main.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::task::{Task, simple_executor::SimpleExecutor}; </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// […] initialization routines, including `init_heap` </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> executor = SimpleExecutor::new(); </span><span> executor.spawn(Task::new(example_task())); </span><span> executor.run(); </span><span> </span><span> </span><span style="color:#608b4e;">// […] test_main, &quot;it did not crash&quot; message, hlt_loop </span><span>} </span><span> </span><span> </span><span style="color:#608b4e;">// Below is the example_task function again so that you don&#39;t have to scroll up </span><span> </span><span>async </span><span style="color:#569cd6;">fn </span><span>async_number() -&gt; </span><span style="color:#569cd6;">u32 </span><span>{ </span><span> </span><span style="color:#b5cea8;">42 </span><span>} </span><span> </span><span>async </span><span style="color:#569cd6;">fn </span><span>example_task() { </span><span> </span><span style="color:#569cd6;">let</span><span> number = async_number().await; </span><span> println!(</span><span style="color:#d69d85;">&quot;async number: </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, number); </span><span>} </span></code></pre> <p>When we run it, we see that the expected <em>“async number: 42”</em> message is printed to the screen:</p> <p><img src="https://os.phil-opp.com/async-await/qemu-simple-executor.png" alt="QEMU printing “Hello World”, “async number: 42”, and “It did not crash!”" /></p> <p>Let’s summarize the various steps that happen in this example:</p> <ul> <li>First, a new instance of our <code>SimpleExecutor</code> type is created with an empty <code>task_queue</code>.</li> <li>Next, we call the asynchronous <code>example_task</code> function, which returns a future. We wrap this future in the <code>Task</code> type, which moves it to the heap and pins it, and then add the task to the <code>task_queue</code> of the executor through the <code>spawn</code> method.</li> <li>We then call the <code>run</code> method to start the execution of the single task in the queue. This involves: <ul> <li>Popping the task from the front of the <code>task_queue</code>.</li> <li>Creating a <code>RawWaker</code> for the task, converting it to a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a> instance, and then creating a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Context.html"><code>Context</code></a> instance from it.</li> <li>Calling the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll"><code>poll</code></a> method on the future of the task, using the <code>Context</code> we just created.</li> <li>Since the <code>example_task</code> does not wait for anything, it can directly run till its end on the first <code>poll</code> call. This is where the <em>“async number: 42”</em> line is printed.</li> <li>Since the <code>example_task</code> directly returns <code>Poll::Ready</code>, it is not added back to the task queue.</li> </ul> </li> <li>The <code>run</code> method returns after the <code>task_queue</code> becomes empty. The execution of our <code>kernel_main</code> function continues and the <em>“It did not crash!”</em> message is printed.</li> </ul> <h3 id="async-keyboard-input"><a class="zola-anchor" href="#async-keyboard-input" aria-label="Anchor link for: async-keyboard-input">🔗</a>Async Keyboard Input</h3> <p>Our simple executor does not utilize the <code>Waker</code> notifications and simply loops over all tasks until they are done. This wasn’t a problem for our example since our <code>example_task</code> can directly run to finish on the first <code>poll</code> call. To see the performance advantages of a proper <code>Waker</code> implementation, we first need to create a task that is truly asynchronous, i.e., a task that will probably return <code>Poll::Pending</code> on the first <code>poll</code> call.</p> <p>We already have some kind of asynchronicity in our system that we can use for this: hardware interrupts. As we learned in the <a href="https://os.phil-opp.com/hardware-interrupts/"><em>Interrupts</em></a> post, hardware interrupts can occur at arbitrary points in time, determined by some external device. For example, a hardware timer sends an interrupt to the CPU after some predefined time has elapsed. When the CPU receives an interrupt, it immediately transfers control to the corresponding handler function defined in the interrupt descriptor table (IDT).</p> <p>In the following, we will create an asynchronous task based on the keyboard interrupt. The keyboard interrupt is a good candidate for this because it is both non-deterministic and latency-critical. Non-deterministic means that there is no way to predict when the next key press will occur because it is entirely dependent on the user. Latency-critical means that we want to handle the keyboard input in a timely manner, otherwise the user will feel a lag. To support such a task in an efficient way, it will be essential that the executor has proper support for <code>Waker</code> notifications.</p> <h4 id="scancode-queue"><a class="zola-anchor" href="#scancode-queue" aria-label="Anchor link for: scancode-queue">🔗</a>Scancode Queue</h4> <p>Currently, we handle the keyboard input directly in the interrupt handler. This is not a good idea for the long term because interrupt handlers should stay as short as possible as they might interrupt important work. Instead, interrupt handlers should only perform the minimal amount of work necessary (e.g., reading the keyboard scancode) and leave the rest of the work (e.g., interpreting the scancode) to a background task.</p> <p>A common pattern for delegating work to a background task is to create some sort of queue. The interrupt handler pushes units of work to the queue, and the background task handles the work in the queue. Applied to our keyboard interrupt, this means that the interrupt handler only reads the scancode from the keyboard, pushes it to the queue, and then returns. The keyboard task sits on the other end of the queue and interprets and handles each scancode that is pushed to it:</p> <p><img src="https://os.phil-opp.com/async-await/scancode-queue.svg" alt="Scancode queue with 8 slots on the top. Keyboard interrupt handler on the bottom left with a “push scancode” arrow to the left of the queue. Keyboard task on the bottom right with a “pop scancode” arrow coming from the right side of the queue." /></p> <p>A simple implementation of that queue could be a mutex-protected <a href="https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html"><code>VecDeque</code></a>. However, using mutexes in interrupt handlers is not a good idea since it can easily lead to deadlocks. For example, when the user presses a key while the keyboard task has locked the queue, the interrupt handler tries to acquire the lock again and hangs indefinitely. Another problem with this approach is that <code>VecDeque</code> automatically increases its capacity by performing a new heap allocation when it becomes full. This can lead to deadlocks again because our allocator also uses a mutex internally. Further problems are that heap allocations can fail or take a considerable amount of time when the heap is fragmented.</p> <p>To prevent these problems, we need a queue implementation that does not require mutexes or allocations for its <code>push</code> operation. Such queues can be implemented by using lock-free <a href="https://doc.rust-lang.org/core/sync/atomic/index.html">atomic operations</a> for pushing and popping elements. This way, it is possible to create <code>push</code> and <code>pop</code> operations that only require a <code>&amp;self</code> reference and are thus usable without a mutex. To avoid allocations on <code>push</code>, the queue can be backed by a pre-allocated fixed-size buffer. While this makes the queue <em>bounded</em> (i.e., it has a maximum length), it is often possible to define reasonable upper bounds for the queue length in practice, so that this isn’t a big problem.</p> <h5 id="the-crossbeam-crate"><a class="zola-anchor" href="#the-crossbeam-crate" aria-label="Anchor link for: the-crossbeam-crate">🔗</a>The <code>crossbeam</code> Crate</h5> <p>Implementing such a queue in a correct and efficient way is very difficult, so I recommend sticking to existing, well-tested implementations. One popular Rust project that implements various mutex-free types for concurrent programming is <a href="https://github.com/crossbeam-rs/crossbeam"><code>crossbeam</code></a>. It provides a type named <a href="https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html"><code>ArrayQueue</code></a> that is exactly what we need in this case. And we’re lucky: the type is fully compatible with <code>no_std</code> crates with allocation support.</p> <p>To use the type, we need to add a dependency on the <code>crossbeam-queue</code> crate:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies.crossbeam-queue</span><span>] </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.3.11&quot; </span><span style="color:#569cd6;">default-features </span><span>= </span><span style="color:#569cd6;">false </span><span style="color:#569cd6;">features </span><span>= [</span><span style="color:#d69d85;">&quot;alloc&quot;</span><span>] </span></code></pre> <p>By default, the crate depends on the standard library. To make it <code>no_std</code> compatible, we need to disable its default features and instead enable the <code>alloc</code> feature. <span class="gray">(Note that we could also add a dependency on the main <code>crossbeam</code> crate, which re-exports the <code>crossbeam-queue</code> crate, but this would result in a larger number of dependencies and longer compile times.)</span></p> <h5 id="queue-implementation"><a class="zola-anchor" href="#queue-implementation" aria-label="Anchor link for: queue-implementation">🔗</a>Queue Implementation</h5> <p>Using the <code>ArrayQueue</code> type, we can now create a global scancode queue in a new <code>task::keyboard</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>keyboard; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">use </span><span>conquer_once::spin::OnceCell; </span><span style="color:#569cd6;">use </span><span>crossbeam_queue::ArrayQueue; </span><span> </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">SCANCODE_QUEUE</span><span>: OnceCell&lt;ArrayQueue&lt;</span><span style="color:#569cd6;">u8</span><span>&gt;&gt; = OnceCell::uninit(); </span></code></pre> <p>Since <a href="https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new"><code>ArrayQueue::new</code></a> performs a heap allocation, which is not possible at compile time (<a href="https://github.com/rust-lang/const-eval/issues/20">yet</a>), we can’t initialize the static variable directly. Instead, we use the <a href="https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html"><code>OnceCell</code></a> type of the <a href="https://docs.rs/conquer-once/0.2.0/conquer_once/index.html"><code>conquer_once</code></a> crate, which makes it possible to perform a safe one-time initialization of static values. To include the crate, we need to add it as a dependency in our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies.conquer-once</span><span>] </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.2.0&quot; </span><span style="color:#569cd6;">default-features </span><span>= </span><span style="color:#569cd6;">false </span></code></pre> <p>Instead of the <a href="https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html"><code>OnceCell</code></a> primitive, we could also use the <a href="https://docs.rs/lazy_static/1.4.0/lazy_static/index.html"><code>lazy_static</code></a> macro here. However, the <code>OnceCell</code> type has the advantage that we can ensure that the initialization does not happen in the interrupt handler, thus preventing the interrupt handler from performing a heap allocation.</p> <h4 id="filling-the-queue"><a class="zola-anchor" href="#filling-the-queue" aria-label="Anchor link for: filling-the-queue">🔗</a>Filling the Queue</h4> <p>To fill the scancode queue, we create a new <code>add_scancode</code> function that we will call from the interrupt handler:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">use crate</span><span>::println; </span><span> </span><span style="color:#608b4e;">/// Called by the keyboard interrupt handler </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// Must not block or allocate. </span><span style="color:#569cd6;">pub</span><span>(</span><span style="color:#569cd6;">crate</span><span>) </span><span style="color:#569cd6;">fn </span><span>add_scancode(scancode: </span><span style="color:#569cd6;">u8</span><span>) { </span><span> </span><span style="color:#569cd6;">if let </span><span>Ok(queue) = </span><span style="color:#b4cea8;">SCANCODE_QUEUE</span><span>.try_get() { </span><span> </span><span style="color:#569cd6;">if let </span><span>Err(</span><span style="color:#569cd6;">_</span><span>) = queue.push(scancode) { </span><span> println!(</span><span style="color:#d69d85;">&quot;WARNING: scancode queue full; dropping keyboard input&quot;</span><span>); </span><span> } </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;WARNING: scancode queue uninitialized&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>We use <a href="https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get"><code>OnceCell::try_get</code></a> to get a reference to the initialized queue. If the queue is not initialized yet, we ignore the keyboard scancode and print a warning. It’s important that we don’t try to initialize the queue in this function because it will be called by the interrupt handler, which should not perform heap allocations. Since this function should not be callable from our <code>main.rs</code>, we use the <code>pub(crate)</code> visibility to make it only available to our <code>lib.rs</code>.</p> <p>The fact that the <a href="https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push"><code>ArrayQueue::push</code></a> method requires only a <code>&amp;self</code> reference makes it very simple to call the method on the static queue. The <code>ArrayQueue</code> type performs all the necessary synchronization itself, so we don’t need a mutex wrapper here. In case the queue is full, we print a warning too.</p> <p>To call the <code>add_scancode</code> function on keyboard interrupts, we update our <code>keyboard_interrupt_handler</code> function in the <code>interrupts</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>keyboard_interrupt_handler( </span><span> _stack_frame: InterruptStackFrame </span><span>) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::new(</span><span style="color:#b5cea8;">0x60</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> scancode: </span><span style="color:#569cd6;">u8 </span><span>= </span><span style="color:#569cd6;">unsafe </span><span>{ port.read() }; </span><span> </span><span style="color:#569cd6;">crate</span><span>::task::keyboard::add_scancode(scancode); </span><span style="color:#608b4e;">// new </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">PICS</span><span>.lock() </span><span> .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); </span><span> } </span><span>} </span></code></pre> <p>We removed all the keyboard handling code from this function and instead added a call to the <code>add_scancode</code> function. The rest of the function stays the same as before.</p> <p>As expected, keypresses are no longer printed to the screen when we run our project using <code>cargo run</code> now. Instead, we see the warning that the scancode queue is uninitialized for every keystroke.</p> <h4 id="scancode-stream"><a class="zola-anchor" href="#scancode-stream" aria-label="Anchor link for: scancode-stream">🔗</a>Scancode Stream</h4> <p>To initialize the <code>SCANCODE_QUEUE</code> and read the scancodes from the queue in an asynchronous way, we create a new <code>ScancodeStream</code> type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>ScancodeStream { </span><span> _private: (), </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>ScancodeStream { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#b4cea8;">SCANCODE_QUEUE</span><span>.try_init_once(|| ArrayQueue::new(</span><span style="color:#b5cea8;">100</span><span>)) </span><span> .expect(</span><span style="color:#d69d85;">&quot;ScancodeStream::new should only be called once&quot;</span><span>); </span><span> ScancodeStream { _private: () } </span><span> } </span><span>} </span></code></pre> <p>The purpose of the <code>_private</code> field is to prevent construction of the struct from outside of the module. This makes the <code>new</code> function the only way to construct the type. In the function, we first try to initialize the <code>SCANCODE_QUEUE</code> static. We panic if it is already initialized to ensure that only a single <code>ScancodeStream</code> instance can be created.</p> <p>To make the scancodes available to asynchronous tasks, the next step is to implement a <code>poll</code>-like method that tries to pop the next scancode off the queue. While this sounds like we should implement the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html"><code>Future</code></a> trait for our type, this does not quite fit here. The problem is that the <code>Future</code> trait only abstracts over a single asynchronous value and expects that the <code>poll</code> method is not called again after it returns <code>Poll::Ready</code>. Our scancode queue, however, contains multiple asynchronous values, so it is okay to keep polling it.</p> <h5 id="the-stream-trait"><a class="zola-anchor" href="#the-stream-trait" aria-label="Anchor link for: the-stream-trait">🔗</a>The <code>Stream</code> Trait</h5> <p>Since types that yield multiple asynchronous values are common, the <a href="https://docs.rs/futures/0.3.4/futures/"><code>futures</code></a> crate provides a useful abstraction for such types: the <a href="https://rust-lang.github.io/async-book/05_streams/01_chapter.html"><code>Stream</code></a> trait. The trait is defined like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub trait </span><span>Stream { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Item</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>poll_next(self: Pin&lt;</span><span style="color:#569cd6;">&amp;mut Self</span><span>&gt;, cx: </span><span style="color:#569cd6;">&amp;mut</span><span> Context) </span><span> -&gt; Poll&lt;Option&lt;</span><span style="color:#569cd6;">Self::</span><span>Item&gt;&gt;; </span><span>} </span></code></pre> <p>This definition is quite similar to the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html"><code>Future</code></a> trait, with the following differences:</p> <ul> <li>The associated type is named <code>Item</code> instead of <code>Output</code>.</li> <li>Instead of a <code>poll</code> method that returns <code>Poll&lt;Self::Item&gt;</code>, the <code>Stream</code> trait defines a <code>poll_next</code> method that returns a <code>Poll&lt;Option&lt;Self::Item&gt;&gt;</code> (note the additional <code>Option</code>).</li> </ul> <p>There is also a semantic difference: The <code>poll_next</code> can be called repeatedly, until it returns <code>Poll::Ready(None)</code> to signal that the stream is finished. In this regard, the method is similar to the <a href="https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next"><code>Iterator::next</code></a> method, which also returns <code>None</code> after the last value.</p> <h5 id="implementing-stream"><a class="zola-anchor" href="#implementing-stream" aria-label="Anchor link for: implementing-stream">🔗</a>Implementing <code>Stream</code></h5> <p>Let’s implement the <code>Stream</code> trait for our <code>ScancodeStream</code> to provide the values of the <code>SCANCODE_QUEUE</code> in an asynchronous way. For this, we first need to add a dependency on the <code>futures-util</code> crate, which contains the <code>Stream</code> type:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies.futures-util</span><span>] </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.3.4&quot; </span><span style="color:#569cd6;">default-features </span><span>= </span><span style="color:#569cd6;">false </span><span style="color:#569cd6;">features </span><span>= [</span><span style="color:#d69d85;">&quot;alloc&quot;</span><span>] </span></code></pre> <p>We disable the default features to make the crate <code>no_std</code> compatible and enable the <code>alloc</code> feature to make its allocation-based types available (we will need this later). <span class="gray">(Note that we could also add a dependency on the main <code>futures</code> crate, which re-exports the <code>futures-util</code> crate, but this would result in a larger number of dependencies and longer compile times.)</span></p> <p>Now we can import and implement the <code>Stream</code> trait:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::{pin::Pin, task::{Poll, Context}}; </span><span style="color:#569cd6;">use </span><span>futures_util::stream::Stream; </span><span> </span><span style="color:#569cd6;">impl </span><span>Stream </span><span style="color:#569cd6;">for </span><span>ScancodeStream { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Item </span><span>= </span><span style="color:#569cd6;">u8</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>poll_next(self: Pin&lt;</span><span style="color:#569cd6;">&amp;mut Self</span><span>&gt;, cx: </span><span style="color:#569cd6;">&amp;mut</span><span> Context) -&gt; Poll&lt;Option&lt;</span><span style="color:#569cd6;">u8</span><span>&gt;&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> queue = </span><span style="color:#b4cea8;">SCANCODE_QUEUE</span><span>.try_get().expect(</span><span style="color:#d69d85;">&quot;not initialized&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">match</span><span> queue.pop() { </span><span> Some(scancode) </span><span style="color:#569cd6;">=&gt; </span><span>Poll::Ready(Some(scancode)), </span><span> None </span><span style="color:#569cd6;">=&gt; </span><span>Poll::Pending, </span><span> } </span><span> } </span><span>} </span></code></pre> <p>We first use the <a href="https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get"><code>OnceCell::try_get</code></a> method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the <code>new</code> function, so we can safely use the <code>expect</code> method to panic if it’s not initialized. Next, we use the <a href="https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop"><code>ArrayQueue::pop</code></a> method to try to get the next element from the queue. If it succeeds, we return the scancode wrapped in <code>Poll::Ready(Some(…))</code>. If it fails, it means that the queue is empty. In that case, we return <code>Poll::Pending</code>.</p> <h4 id="waker-support"><a class="zola-anchor" href="#waker-support" aria-label="Anchor link for: waker-support">🔗</a>Waker Support</h4> <p>Like the <code>Futures::poll</code> method, the <code>Stream::poll_next</code> method requires the asynchronous task to notify the executor when it becomes ready after <code>Poll::Pending</code> is returned. This way, the executor does not need to poll the same task again until it is notified, which greatly reduces the performance overhead of waiting tasks.</p> <p>To send this notification, the task should extract the <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a> from the passed <a href="https://doc.rust-lang.org/nightly/core/task/struct.Context.html"><code>Context</code></a> reference and store it somewhere. When the task becomes ready, it should invoke the <a href="https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake"><code>wake</code></a> method on the stored <code>Waker</code> to notify the executor that the task should be polled again.</p> <h5 id="atomicwaker"><a class="zola-anchor" href="#atomicwaker" aria-label="Anchor link for: atomicwaker">🔗</a>AtomicWaker</h5> <p>To implement the <code>Waker</code> notification for our <code>ScancodeStream</code>, we need a place where we can store the <code>Waker</code> between poll calls. We can’t store it as a field in the <code>ScancodeStream</code> itself because it needs to be accessible from the <code>add_scancode</code> function. The solution to this is to use a static variable of the <a href="https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html"><code>AtomicWaker</code></a> type provided by the <code>futures-util</code> crate. Like the <code>ArrayQueue</code> type, this type is based on atomic instructions and can be safely stored in a <code>static</code> and modified concurrently.</p> <p>Let’s use the <a href="https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html"><code>AtomicWaker</code></a> type to define a static <code>WAKER</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">use </span><span>futures_util::task::AtomicWaker; </span><span> </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">WAKER</span><span>: AtomicWaker = AtomicWaker::new(); </span></code></pre> <p>The idea is that the <code>poll_next</code> implementation stores the current waker in this static, and the <code>add_scancode</code> function calls the <code>wake</code> function on it when a new scancode is added to the queue.</p> <h5 id="storing-a-waker"><a class="zola-anchor" href="#storing-a-waker" aria-label="Anchor link for: storing-a-waker">🔗</a>Storing a Waker</h5> <p>The contract defined by <code>poll</code>/<code>poll_next</code> requires the task to register a wakeup for the passed <code>Waker</code> when it returns <code>Poll::Pending</code>. Let’s modify our <code>poll_next</code> implementation to satisfy this requirement:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Stream </span><span style="color:#569cd6;">for </span><span>ScancodeStream { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Item </span><span>= </span><span style="color:#569cd6;">u8</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>poll_next(self: Pin&lt;</span><span style="color:#569cd6;">&amp;mut Self</span><span>&gt;, cx: </span><span style="color:#569cd6;">&amp;mut</span><span> Context) -&gt; Poll&lt;Option&lt;</span><span style="color:#569cd6;">u8</span><span>&gt;&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> queue = </span><span style="color:#b4cea8;">SCANCODE_QUEUE </span><span> .try_get() </span><span> .expect(</span><span style="color:#d69d85;">&quot;scancode queue not initialized&quot;</span><span>); </span><span> </span><span> </span><span style="color:#608b4e;">// fast path </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(scancode) = queue.pop() { </span><span> </span><span style="color:#569cd6;">return </span><span>Poll::Ready(Some(scancode)); </span><span> } </span><span> </span><span> </span><span style="color:#b4cea8;">WAKER</span><span>.register(</span><span style="color:#569cd6;">&amp;</span><span>cx.waker()); </span><span> </span><span style="color:#569cd6;">match</span><span> queue.pop() { </span><span> Some(scancode) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#b4cea8;">WAKER</span><span>.take(); </span><span> Poll::Ready(Some(scancode)) </span><span> } </span><span> None </span><span style="color:#569cd6;">=&gt; </span><span>Poll::Pending, </span><span> } </span><span> } </span><span>} </span></code></pre> <p>Like before, we first use the <a href="https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get"><code>OnceCell::try_get</code></a> function to get a reference to the initialized scancode queue. We then optimistically try to <code>pop</code> from the queue and return <code>Poll::Ready</code> when it succeeds. This way, we can avoid the performance overhead of registering a waker when the queue is not empty.</p> <p>If the first call to <code>queue.pop()</code> does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again for the next check, we need to register the <code>Waker</code> in the <code>WAKER</code> static before the second check. This way, a wakeup might happen before we return <code>Poll::Pending</code>, but it is guaranteed that we get a wakeup for any scancodes pushed after the check.</p> <p>After registering the <code>Waker</code> contained in the passed <a href="https://doc.rust-lang.org/nightly/core/task/struct.Context.html"><code>Context</code></a> through the <a href="https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register"><code>AtomicWaker::register</code></a> function, we try to pop from the queue a second time. If it now succeeds, we return <code>Poll::Ready</code>. We also remove the registered waker again using <a href="https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take"><code>AtomicWaker::take</code></a> because a waker notification is no longer needed. In case <code>queue.pop()</code> fails for a second time, we return <code>Poll::Pending</code> like before, but this time with a registered wakeup.</p> <p>Note that there are two ways that a wakeup can happen for a task that did not return <code>Poll::Pending</code> (yet). One way is the mentioned race condition when the wakeup happens immediately before returning <code>Poll::Pending</code>. The other way is when the queue is no longer empty after registering the waker, so that <code>Poll::Ready</code> is returned. Since these spurious wakeups are not preventable, the executor needs to be able to handle them correctly.</p> <h5 id="waking-the-stored-waker"><a class="zola-anchor" href="#waking-the-stored-waker" aria-label="Anchor link for: waking-the-stored-waker">🔗</a>Waking the Stored Waker</h5> <p>To wake the stored <code>Waker</code>, we add a call to <code>WAKER.wake()</code> in the <code>add_scancode</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">pub</span><span>(</span><span style="color:#569cd6;">crate</span><span>) </span><span style="color:#569cd6;">fn </span><span>add_scancode(scancode: </span><span style="color:#569cd6;">u8</span><span>) { </span><span> </span><span style="color:#569cd6;">if let </span><span>Ok(queue) = </span><span style="color:#b4cea8;">SCANCODE_QUEUE</span><span>.try_get() { </span><span> </span><span style="color:#569cd6;">if let </span><span>Err(</span><span style="color:#569cd6;">_</span><span>) = queue.push(scancode) { </span><span> println!(</span><span style="color:#d69d85;">&quot;WARNING: scancode queue full; dropping keyboard input&quot;</span><span>); </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> </span><span style="color:#b4cea8;">WAKER</span><span>.wake(); </span><span style="color:#608b4e;">// new </span><span> } </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;WARNING: scancode queue uninitialized&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>The only change that we made is to add a call to <code>WAKER.wake()</code> if the push to the scancode queue succeeds. If a waker is registered in the <code>WAKER</code> static, this method will call the equally-named <a href="https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake"><code>wake</code></a> method on it, which notifies the executor. Otherwise, the operation is a no-op, i.e., nothing happens.</p> <p>It is important that we call <code>wake</code> only after pushing to the queue because otherwise the task might be woken too early while the queue is still empty. This can, for example, happen when using a multi-threaded executor that starts the woken task concurrently on a different CPU core. While we don’t have thread support yet, we will add it soon and don’t want things to break then.</p> <h4 id="keyboard-task"><a class="zola-anchor" href="#keyboard-task" aria-label="Anchor link for: keyboard-task">🔗</a>Keyboard Task</h4> <p>Now that we implemented the <code>Stream</code> trait for our <code>ScancodeStream</code>, we can use it to create an asynchronous keyboard task:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/keyboard.rs </span><span> </span><span style="color:#569cd6;">use </span><span>futures_util::stream::StreamExt; </span><span style="color:#569cd6;">use </span><span>pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; </span><span style="color:#569cd6;">use crate</span><span>::print; </span><span> </span><span style="color:#569cd6;">pub</span><span> async </span><span style="color:#569cd6;">fn </span><span>print_keypresses() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> scancodes = ScancodeStream::new(); </span><span> </span><span style="color:#569cd6;">let mut</span><span> keyboard = Keyboard::new(ScancodeSet1::new(), </span><span> layouts::Us104Key, HandleControl::Ignore); </span><span> </span><span> </span><span style="color:#569cd6;">while let </span><span>Some(scancode) = scancodes.next().await { </span><span> </span><span style="color:#569cd6;">if let </span><span>Ok(Some(key_event)) = keyboard.add_byte(scancode) { </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(key) = keyboard.process_keyevent(key_event) { </span><span> </span><span style="color:#569cd6;">match</span><span> key { </span><span> DecodedKey::Unicode(character) </span><span style="color:#569cd6;">=&gt; </span><span>print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, character), </span><span> DecodedKey::RawKey(key) </span><span style="color:#569cd6;">=&gt; </span><span>print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, key), </span><span> } </span><span> } </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The code is very similar to the code we had in our <a href="https://os.phil-opp.com/hardware-interrupts/#interpreting-the-scancodes">keyboard interrupt handler</a> before we modified it in this post. The only difference is that, instead of reading the scancode from an I/O port, we take it from the <code>ScancodeStream</code>. For this, we first create a new <code>Scancode</code> stream and then repeatedly use the <a href="https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next"><code>next</code></a> method provided by the <a href="https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html"><code>StreamExt</code></a> trait to get a <code>Future</code> that resolves to the next element in the stream. By using the <code>await</code> operator on it, we asynchronously wait for the result of the future.</p> <p>We use <code>while let</code> to loop until the stream returns <code>None</code> to signal its end. Since our <code>poll_next</code> method never returns <code>None</code>, this is effectively an endless loop, so the <code>print_keypresses</code> task never finishes.</p> <p>Let’s add the <code>print_keypresses</code> task to our executor in our <code>main.rs</code> to get working keyboard input again:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::task::keyboard; </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span> </span><span style="color:#608b4e;">// […] initialization routines, including init_heap, test_main </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> executor = SimpleExecutor::new(); </span><span> executor.spawn(Task::new(example_task())); </span><span> executor.spawn(Task::new(keyboard::print_keypresses())); </span><span style="color:#608b4e;">// new </span><span> executor.run(); </span><span> </span><span> </span><span style="color:#608b4e;">// […] &quot;it did not crash&quot; message, hlt_loop </span><span>} </span></code></pre> <p>When we execute <code>cargo run</code> now, we see that keyboard input works again:</p> <p><img src="https://os.phil-opp.com/async-await/qemu-keyboard-output.gif" alt="QEMU printing “…..H…e…l…l..o….. …W..o..r….l…d…!”" /></p> <p>If you keep an eye on the CPU utilization of your computer, you will see that the <code>QEMU</code> process now continuously keeps the CPU busy. This happens because our <code>SimpleExecutor</code> polls tasks over and over again in a loop. So even if we don’t press any keys on the keyboard, the executor repeatedly calls <code>poll</code> on our <code>print_keypresses</code> task, even though the task cannot make any progress and will return <code>Poll::Pending</code> each time.</p> <h3 id="executor-with-waker-support"><a class="zola-anchor" href="#executor-with-waker-support" aria-label="Anchor link for: executor-with-waker-support">🔗</a>Executor with Waker Support</h3> <p>To fix the performance problem, we need to create an executor that properly utilizes the <code>Waker</code> notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the <code>print_keypresses</code> task over and over again.</p> <h4 id="task-id"><a class="zola-anchor" href="#task-id" aria-label="Anchor link for: task-id">🔗</a>Task Id</h4> <p>The first step in creating an executor with proper support for waker notifications is to give each task a unique ID. This is required because we need a way to specify which task should be woken. We start by creating a new <code>TaskId</code> wrapper type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span>#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] </span><span style="color:#569cd6;">struct </span><span>TaskId(</span><span style="color:#569cd6;">u64</span><span>); </span></code></pre> <p>The <code>TaskId</code> struct is a simple wrapper type around <code>u64</code>. We derive a number of traits for it to make it printable, copyable, comparable, and sortable. The latter is important because we want to use <code>TaskId</code> as the key type of a <a href="https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html"><code>BTreeMap</code></a> in a moment.</p> <p>To create a new unique ID, we create a <code>TaskId::new</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>core::sync::atomic::{AtomicU64, Ordering}; </span><span> </span><span style="color:#569cd6;">impl </span><span>TaskId { </span><span> </span><span style="color:#569cd6;">fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">NEXT_ID</span><span>: AtomicU64 = AtomicU64::new(</span><span style="color:#b5cea8;">0</span><span>); </span><span> TaskId(</span><span style="color:#b4cea8;">NEXT_ID</span><span>.fetch_add(</span><span style="color:#b5cea8;">1</span><span>, Ordering::Relaxed)) </span><span> } </span><span>} </span></code></pre> <p>The function uses a static <code>NEXT_ID</code> variable of type <a href="https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html"><code>AtomicU64</code></a> to ensure that each ID is assigned only once. The <a href="https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html#method.fetch_add"><code>fetch_add</code></a> method atomically increases the value and returns the previous value in one atomic operation. This means that even when the <code>TaskId::new</code> method is called in parallel, every ID is returned exactly once. The <a href="https://doc.rust-lang.org/core/sync/atomic/enum.Ordering.html"><code>Ordering</code></a> parameter defines whether the compiler is allowed to reorder the <code>fetch_add</code> operation in the instructions stream. Since we only require that the ID be unique, the <code>Relaxed</code> ordering with the weakest requirements is enough in this case.</p> <p>We can now extend our <code>Task</code> type with an additional <code>id</code> field:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Task { </span><span> id: TaskId, </span><span style="color:#608b4e;">// new </span><span> future: Pin&lt;Box&lt;dyn Future&lt;Output = ()&gt;&gt;&gt;, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Task { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new(future: impl Future&lt;Output = ()&gt; + </span><span style="color:#569cd6;">&#39;static</span><span>) -&gt; Task { </span><span> Task { </span><span> id: TaskId::new(), </span><span style="color:#608b4e;">// new </span><span> future: Box::pin(future), </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The new <code>id</code> field makes it possible to uniquely name a task, which is required for waking a specific task.</p> <h4 id="the-executor-type"><a class="zola-anchor" href="#the-executor-type" aria-label="Anchor link for: the-executor-type">🔗</a>The <code>Executor</code> Type</h4> <p>We create our new <code>Executor</code> type in a <code>task::executor</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/mod.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>executor; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::{Task, TaskId}; </span><span style="color:#569cd6;">use </span><span>alloc::{collections::BTreeMap, sync::Arc}; </span><span style="color:#569cd6;">use </span><span>core::task::Waker; </span><span style="color:#569cd6;">use </span><span>crossbeam_queue::ArrayQueue; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Executor { </span><span> tasks: BTreeMap&lt;TaskId, Task&gt;, </span><span> task_queue: Arc&lt;ArrayQueue&lt;TaskId&gt;&gt;, </span><span> waker_cache: BTreeMap&lt;TaskId, Waker&gt;, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Executor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> Executor { </span><span> tasks: BTreeMap::new(), </span><span> task_queue: Arc::new(ArrayQueue::new(</span><span style="color:#b5cea8;">100</span><span>)), </span><span> waker_cache: BTreeMap::new(), </span><span> } </span><span> } </span><span>} </span></code></pre> <p>Instead of storing tasks in a <a href="https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html"><code>VecDeque</code></a> like we did for our <code>SimpleExecutor</code>, we use a <code>task_queue</code> of task IDs and a <a href="https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html"><code>BTreeMap</code></a> named <code>tasks</code> that contains the actual <code>Task</code> instances. The map is indexed by the <code>TaskId</code> to allow efficient continuation of a specific task.</p> <p>The <code>task_queue</code> field is an <a href="https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html"><code>ArrayQueue</code></a> of task IDs, wrapped into the <a href="https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html"><code>Arc</code></a> type that implements <em>reference counting</em>. Reference counting makes it possible to share ownership of the value among multiple owners. It works by allocating the value on the heap and counting the number of active references to it. When the number of active references reaches zero, the value is no longer needed and can be deallocated.</p> <p>We use this <code>Arc&lt;ArrayQueue&gt;</code> type for the <code>task_queue</code> because it will be shared between the executor and wakers. The idea is that the wakers push the ID of the woken task to the queue. The executor sits on the receiving end of the queue, retrieves the woken tasks by their ID from the <code>tasks</code> map, and then runs them. The reason for using a fixed-size queue instead of an unbounded queue such as <a href="https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html"><code>SegQueue</code></a> is that interrupt handlers should not allocate on push to this queue.</p> <p>In addition to the <code>task_queue</code> and the <code>tasks</code> map, the <code>Executor</code> type has a <code>waker_cache</code> field that is also a map. This map caches the <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a> of a task after its creation. This has two reasons: First, it improves performance by reusing the same waker for multiple wake-ups of the same task instead of creating a new waker each time. Second, it ensures that reference-counted wakers are not deallocated inside interrupt handlers because it could lead to deadlocks (there are more details on this below).</p> <p>To create an <code>Executor</code>, we provide a simple <code>new</code> function. We choose a capacity of 100 for the <code>task_queue</code>, which should be more than enough for the foreseeable future. In case our system will have more than 100 concurrent tasks at some point, we can easily increase this size.</p> <h4 id="spawning-tasks"><a class="zola-anchor" href="#spawning-tasks" aria-label="Anchor link for: spawning-tasks">🔗</a>Spawning Tasks</h4> <p>As for the <code>SimpleExecutor</code>, we provide a <code>spawn</code> method on our <code>Executor</code> type that adds a given task to the <code>tasks</code> map and immediately wakes it by pushing its ID to the <code>task_queue</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Executor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>spawn(</span><span style="color:#569cd6;">&amp;mut </span><span>self, task: Task) { </span><span> </span><span style="color:#569cd6;">let</span><span> task_id = task.id; </span><span> </span><span style="color:#569cd6;">if </span><span>self.tasks.insert(task.id, task).is_some() { </span><span> panic!(</span><span style="color:#d69d85;">&quot;task with same ID already in tasks&quot;</span><span>); </span><span> } </span><span> self.task_queue.push(task_id).expect(</span><span style="color:#d69d85;">&quot;queue full&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>If there is already a task with the same ID in the map, the [<code>BTreeMap::insert</code>] method returns it. This should never happen since each task has a unique ID, so we panic in this case since it indicates a bug in our code. Similarly, we panic when the <code>task_queue</code> is full since this should never happen if we choose a large-enough queue size.</p> <h4 id="running-tasks"><a class="zola-anchor" href="#running-tasks" aria-label="Anchor link for: running-tasks">🔗</a>Running Tasks</h4> <p>To execute all tasks in the <code>task_queue</code>, we create a private <code>run_ready_tasks</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::task::{Context, Poll}; </span><span> </span><span style="color:#569cd6;">impl </span><span>Executor { </span><span> </span><span style="color:#569cd6;">fn </span><span>run_ready_tasks(</span><span style="color:#569cd6;">&amp;mut </span><span>self) { </span><span> </span><span style="color:#608b4e;">// destructure `self` to avoid borrow checker errors </span><span> </span><span style="color:#569cd6;">let Self </span><span>{ </span><span> tasks, </span><span> task_queue, </span><span> waker_cache, </span><span> } = self; </span><span> </span><span> </span><span style="color:#569cd6;">while let </span><span>Some(task_id) = task_queue.pop() { </span><span> </span><span style="color:#569cd6;">let</span><span> task = </span><span style="color:#569cd6;">match</span><span> tasks.get_mut(</span><span style="color:#569cd6;">&amp;</span><span>task_id) { </span><span> Some(task) </span><span style="color:#569cd6;">=&gt;</span><span> task, </span><span> None </span><span style="color:#569cd6;">=&gt; continue</span><span>, </span><span style="color:#608b4e;">// task no longer exists </span><span> }; </span><span> </span><span style="color:#569cd6;">let</span><span> waker = waker_cache </span><span> .entry(task_id) </span><span> .or_insert_with(|| TaskWaker::new(task_id, task_queue.clone())); </span><span> </span><span style="color:#569cd6;">let mut</span><span> context = Context::from_waker(waker); </span><span> </span><span style="color:#569cd6;">match</span><span> task.poll(</span><span style="color:#569cd6;">&amp;mut</span><span> context) { </span><span> Poll::Ready(()) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#608b4e;">// task done -&gt; remove it and its cached waker </span><span> tasks.remove(</span><span style="color:#569cd6;">&amp;</span><span>task_id); </span><span> waker_cache.remove(</span><span style="color:#569cd6;">&amp;</span><span>task_id); </span><span> } </span><span> Poll::Pending </span><span style="color:#569cd6;">=&gt; </span><span>{} </span><span> } </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The basic idea of this function is similar to our <code>SimpleExecutor</code>: Loop over all tasks in the <code>task_queue</code>, create a waker for each task, and then poll them. However, instead of adding pending tasks back to the end of the <code>task_queue</code>, we let our <code>TaskWaker</code> implementation take care of adding woken tasks back to the queue. The implementation of this waker type will be shown in a moment.</p> <p>Let’s look into some of the implementation details of this <code>run_ready_tasks</code> method:</p> <ul> <li> <p>We use <a href="https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html#destructuring-to-break-apart-values"><em>destructuring</em></a> to split <code>self</code> into its three fields to avoid some borrow checker errors. Namely, our implementation needs to access the <code>self.task_queue</code> from within a closure, which currently tries to borrow <code>self</code> completely. This is a fundamental borrow checker issue that will be resolved when <a href="https://github.com/rust-lang/rfcs/pull/2229">RFC 2229</a> is <a href="https://github.com/rust-lang/rust/issues/53488">implemented</a>.</p> </li> <li> <p>For each popped task ID, we retrieve a mutable reference to the corresponding task from the <code>tasks</code> map. Since our <code>ScancodeStream</code> implementation registers wakers before checking whether a task needs to be put to sleep, it might happen that a wake-up occurs for a task that no longer exists. In this case, we simply ignore the wake-up and continue with the next ID from the queue.</p> </li> <li> <p>To avoid the performance overhead of creating a waker on each poll, we use the <code>waker_cache</code> map to store the waker for each task after it has been created. For this, we use the <a href="https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry"><code>BTreeMap::entry</code></a> method in combination with <a href="https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with"><code>Entry::or_insert_with</code></a> to create a new waker if it doesn’t exist yet and then get a mutable reference to it. For creating a new waker, we clone the <code>task_queue</code> and pass it together with the task ID to the <code>TaskWaker::new</code> function (implementation shown below). Since the <code>task_queue</code> is wrapped into an <code>Arc</code>, the <code>clone</code> only increases the reference count of the value, but still points to the same heap-allocated queue. Note that reusing wakers like this is not possible for all waker implementations, but our <code>TaskWaker</code> type will allow it.</p> </li> </ul> <p>A task is finished when it returns <code>Poll::Ready</code>. In that case, we remove it from the <code>tasks</code> map using the <a href="https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove"><code>BTreeMap::remove</code></a> method. We also remove its cached waker, if it exists.</p> <h4 id="waker-design"><a class="zola-anchor" href="#waker-design" aria-label="Anchor link for: waker-design">🔗</a>Waker Design</h4> <p>The job of the waker is to push the ID of the woken task to the <code>task_queue</code> of the executor. We implement this by creating a new <code>TaskWaker</code> struct that stores the task ID and a reference to the <code>task_queue</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">struct </span><span>TaskWaker { </span><span> task_id: TaskId, </span><span> task_queue: Arc&lt;ArrayQueue&lt;TaskId&gt;&gt;, </span><span>} </span></code></pre> <p>Since the ownership of the <code>task_queue</code> is shared between the executor and wakers, we use the <a href="https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html"><code>Arc</code></a> wrapper type to implement shared reference-counted ownership.</p> <p>The implementation of the wake operation is quite simple:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>TaskWaker { </span><span> </span><span style="color:#569cd6;">fn </span><span>wake_task(</span><span style="color:#569cd6;">&amp;</span><span>self) { </span><span> self.task_queue.push(self.task_id).expect(</span><span style="color:#d69d85;">&quot;task_queue full&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>We push the <code>task_id</code> to the referenced <code>task_queue</code>. Since modifications to the <a href="https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html"><code>ArrayQueue</code></a> type only require a shared reference, we can implement this method on <code>&amp;self</code> instead of <code>&amp;mut self</code>.</p> <h5 id="the-wake-trait"><a class="zola-anchor" href="#the-wake-trait" aria-label="Anchor link for: the-wake-trait">🔗</a>The <code>Wake</code> Trait</h5> <p>In order to use our <code>TaskWaker</code> type for polling futures, we need to convert it to a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a> instance first. This is required because the <a href="https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll"><code>Future::poll</code></a> method takes a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Context.html"><code>Context</code></a> instance as an argument, which can only be constructed from the <code>Waker</code> type. While we could do this by providing an implementation of the <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html"><code>RawWaker</code></a> type, it’s both simpler and safer to instead implement the <code>Arc</code>-based <a href="https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html"><code>Wake</code></a> trait and then use the <a href="https://doc.rust-lang.org/nightly/core/convert/trait.From.html"><code>From</code></a> implementations provided by the standard library to construct the <code>Waker</code>.</p> <p>The trait implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::task::Wake; </span><span> </span><span style="color:#569cd6;">impl </span><span>Wake </span><span style="color:#569cd6;">for </span><span>TaskWaker { </span><span> </span><span style="color:#569cd6;">fn </span><span>wake(self: Arc&lt;</span><span style="color:#569cd6;">Self</span><span>&gt;) { </span><span> self.wake_task(); </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>wake_by_ref(self: </span><span style="color:#569cd6;">&amp;</span><span>Arc&lt;</span><span style="color:#569cd6;">Self</span><span>&gt;) { </span><span> self.wake_task(); </span><span> } </span><span>} </span></code></pre> <p>Since wakers are commonly shared between the executor and the asynchronous tasks, the trait methods require that the <code>Self</code> instance is wrapped in the <a href="https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html"><code>Arc</code></a> type, which implements reference-counted ownership. This means that we have to move our <code>TaskWaker</code> to an <code>Arc</code> in order to call them.</p> <p>The difference between the <code>wake</code> and <code>wake_by_ref</code> methods is that the latter only requires a reference to the <code>Arc</code>, while the former takes ownership of the <code>Arc</code> and thus often requires an increase of the reference count. Not all types support waking by reference, so implementing the <code>wake_by_ref</code> method is optional. However, it can lead to better performance because it avoids unnecessary reference count modifications. In our case, we can simply forward both trait methods to our <code>wake_task</code> function, which requires only a shared <code>&amp;self</code> reference.</p> <h5 id="creating-wakers"><a class="zola-anchor" href="#creating-wakers" aria-label="Anchor link for: creating-wakers">🔗</a>Creating Wakers</h5> <p>Since the <code>Waker</code> type supports <a href="https://doc.rust-lang.org/nightly/core/convert/trait.From.html"><code>From</code></a> conversions for all <code>Arc</code>-wrapped values that implement the <code>Wake</code> trait, we can now implement the <code>TaskWaker::new</code> function that is required by our <code>Executor::run_ready_tasks</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>TaskWaker { </span><span> </span><span style="color:#569cd6;">fn </span><span>new(task_id: TaskId, task_queue: Arc&lt;ArrayQueue&lt;TaskId&gt;&gt;) -&gt; Waker { </span><span> Waker::from(Arc::new(TaskWaker { </span><span> task_id, </span><span> task_queue, </span><span> })) </span><span> } </span><span>} </span></code></pre> <p>We create the <code>TaskWaker</code> using the passed <code>task_id</code> and <code>task_queue</code>. We then wrap the <code>TaskWaker</code> in an <code>Arc</code> and use the <code>Waker::from</code> implementation to convert it to a <a href="https://doc.rust-lang.org/nightly/core/task/struct.Waker.html"><code>Waker</code></a>. This <code>from</code> method takes care of constructing a <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html"><code>RawWakerVTable</code></a> and a <a href="https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html"><code>RawWaker</code></a> instance for our <code>TaskWaker</code> type. In case you’re interested in how it works in detail, check out the <a href="https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87">implementation in the <code>alloc</code> crate</a>.</p> <h4 id="a-run-method-1"><a class="zola-anchor" href="#a-run-method-1" aria-label="Anchor link for: a-run-method-1">🔗</a>A <code>run</code> Method</h4> <p>With our waker implementation in place, we can finally construct a <code>run</code> method for our executor:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Executor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>run(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{ </span><span> self.run_ready_tasks(); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>This method just calls the <code>run_ready_tasks</code> function in a loop. While we could theoretically return from the function when the <code>tasks</code> map becomes empty, this would never happen since our <code>keyboard_task</code> never finishes, so a simple <code>loop</code> should suffice. Since the function never returns, we use the <code>!</code> return type to mark the function as <a href="https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html">diverging</a> to the compiler.</p> <p>We can now change our <code>kernel_main</code> to use our new <code>Executor</code> instead of the <code>SimpleExecutor</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::task::executor::Executor; </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// […] initialization routines, including init_heap, test_main </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> executor = Executor::new(); </span><span style="color:#608b4e;">// new </span><span> executor.spawn(Task::new(example_task())); </span><span> executor.spawn(Task::new(keyboard::print_keypresses())); </span><span> executor.run(); </span><span>} </span></code></pre> <p>We only need to change the import and the type name. Since our <code>run</code> function is marked as diverging, the compiler knows that it never returns, so we no longer need a call to <code>hlt_loop</code> at the end of our <code>kernel_main</code> function.</p> <p>When we run our kernel using <code>cargo run</code> now, we see that keyboard input still works:</p> <p><img src="https://os.phil-opp.com/async-await/qemu-keyboard-output-again.gif" alt="QEMU printing “…..H…e…l…l..o….. …a..g..a….i…n…!”" /></p> <p>However, the CPU utilization of QEMU did not get any better. The reason for this is that we still keep the CPU busy the whole time. We no longer poll tasks until they are woken again, but we still check the <code>task_queue</code> in a busy loop. To fix this, we need to put the CPU to sleep if there is no more work to do.</p> <h4 id="sleep-if-idle"><a class="zola-anchor" href="#sleep-if-idle" aria-label="Anchor link for: sleep-if-idle">🔗</a>Sleep If Idle</h4> <p>The basic idea is to execute the <a href="https://en.wikipedia.org/wiki/HLT_(x86_instruction)"><code>hlt</code> instruction</a> when the <code>task_queue</code> is empty. This instruction puts the CPU to sleep until the next interrupt arrives. The fact that the CPU immediately becomes active again on interrupts ensures that we can still directly react when an interrupt handler pushes to the <code>task_queue</code>.</p> <p>To implement this, we create a new <code>sleep_if_idle</code> method in our executor and call it from our <code>run</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Executor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>run(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{ </span><span> self.run_ready_tasks(); </span><span> self.sleep_if_idle(); </span><span style="color:#608b4e;">// new </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>sleep_if_idle(</span><span style="color:#569cd6;">&amp;</span><span>self) { </span><span> </span><span style="color:#569cd6;">if </span><span>self.task_queue.is_empty() { </span><span> x86_64::instructions::hlt(); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>Since we call <code>sleep_if_idle</code> directly after <code>run_ready_tasks</code>, which loops until the <code>task_queue</code> becomes empty, checking the queue again might seem unnecessary. However, a hardware interrupt might occur directly after <code>run_ready_tasks</code> returns, so there might be a new task in the queue at the time the <code>sleep_if_idle</code> function is called. Only if the queue is still empty, do we put the CPU to sleep by executing the <code>hlt</code> instruction through the <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/fn.hlt.html"><code>instructions::hlt</code></a> wrapper function provided by the <a href="https://docs.rs/x86_64/0.14.2/x86_64/index.html"><code>x86_64</code></a> crate.</p> <p>Unfortunately, there is still a subtle race condition in this implementation. Since interrupts are asynchronous and can happen at any time, it is possible that an interrupt happens right between the <code>is_empty</code> check and the call to <code>hlt</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">if </span><span>self.task_queue.is_empty() { </span><span> </span><span style="color:#608b4e;">/// &lt;--- interrupt can happen here </span><span> x86_64::instructions::hlt(); </span><span>} </span></code></pre> <p>In case this interrupt pushes to the <code>task_queue</code>, we put the CPU to sleep even though there is now a ready task. In the worst case, this could delay the handling of a keyboard interrupt until the next keypress or the next timer interrupt. So how do we prevent it?</p> <p>The answer is to disable interrupts on the CPU before the check and atomically enable them again together with the <code>hlt</code> instruction. This way, all interrupts that happen in between are delayed after the <code>hlt</code> instruction so that no wake-ups are missed. To implement this approach, we can use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html"><code>interrupts::enable_and_hlt</code></a> function provided by the <a href="https://docs.rs/x86_64/0.14.2/x86_64/index.html"><code>x86_64</code></a> crate.</p> <p>The updated implementation of our <code>sleep_if_idle</code> function looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/task/executor.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Executor { </span><span> </span><span style="color:#569cd6;">fn </span><span>sleep_if_idle(</span><span style="color:#569cd6;">&amp;</span><span>self) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::interrupts::{self, enable_and_hlt}; </span><span> </span><span> interrupts::disable(); </span><span> </span><span style="color:#569cd6;">if </span><span>self.task_queue.is_empty() { </span><span> enable_and_hlt(); </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> interrupts::enable(); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>To avoid race conditions, we disable interrupts before checking whether the <code>task_queue</code> is empty. If it is, we use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html"><code>enable_and_hlt</code></a> function to enable interrupts and put the CPU to sleep as a single atomic operation. In case the queue is no longer empty, it means that an interrupt woke a task after <code>run_ready_tasks</code> returned. In that case, we enable interrupts again and directly continue execution without executing <code>hlt</code>.</p> <p>Now our executor properly puts the CPU to sleep when there is nothing to do. We can see that the QEMU process has a much lower CPU utilization when we run our kernel using <code>cargo run</code> again.</p> <h4 id="possible-extensions"><a class="zola-anchor" href="#possible-extensions" aria-label="Anchor link for: possible-extensions">🔗</a>Possible Extensions</h4> <p>Our executor is now able to run tasks in an efficient way. It utilizes waker notifications to avoid polling waiting tasks and puts the CPU to sleep when there is currently no work to do. However, our executor is still quite basic, and there are many possible ways to extend its functionality:</p> <ul> <li><strong>Scheduling</strong>: For our <code>task_queue</code>, we currently use the <a href="https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html"><code>VecDeque</code></a> type to implement a <em>first in first out</em> (FIFO) strategy, which is often also called <em>round robin</em> scheduling. This strategy might not be the most efficient for all workloads. For example, it might make sense to prioritize latency-critical tasks or tasks that do a lot of I/O. See the <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf">scheduling chapter</a> of the <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/"><em>Operating Systems: Three Easy Pieces</em></a> book or the <a href="https://en.wikipedia.org/wiki/Scheduling_(computing)">Wikipedia article on scheduling</a> for more information.</li> <li><strong>Task Spawning</strong>: Our <code>Executor::spawn</code> method currently requires a <code>&amp;mut self</code> reference and is thus no longer available after invoking the <code>run</code> method. To fix this, we could create an additional <code>Spawner</code> type that shares some kind of queue with the executor and allows task creation from within tasks themselves. The queue could be the <code>task_queue</code> directly or a separate queue that the executor checks in its run loop.</li> <li><strong>Utilizing Threads</strong>: We don’t have support for threads yet, but we will add it in the next post. This will make it possible to launch multiple instances of the executor in different threads. The advantage of this approach is that the delay imposed by long-running tasks can be reduced because other tasks can run concurrently. This approach also allows it to utilize multiple CPU cores.</li> <li><strong>Load Balancing</strong>: When adding threading support, it becomes important to know how to distribute the tasks between the executors to ensure that all CPU cores are utilized. A common technique for this is <a href="https://en.wikipedia.org/wiki/Work_stealing"><em>work stealing</em></a>.</li> </ul> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>We started this post by introducing <strong>multitasking</strong> and differentiating between <em>preemptive</em> multitasking, which forcibly interrupts running tasks regularly, and <em>cooperative</em> multitasking, which lets tasks run until they voluntarily give up control of the CPU.</p> <p>We then explored how Rust’s support of <strong>async/await</strong> provides a language-level implementation of cooperative multitasking. Rust bases its implementation on top of the polling-based <code>Future</code> trait, which abstracts asynchronous tasks. Using async/await, it is possible to work with futures almost like with normal synchronous code. The difference is that asynchronous functions return a <code>Future</code> again, which needs to be added to an executor at some point in order to run it.</p> <p>Behind the scenes, the compiler transforms async/await code to <em>state machines</em>, with each <code>.await</code> operation corresponding to a possible pause point. By utilizing its knowledge about the program, the compiler is able to save only the minimal state for each pause point, resulting in a very small memory consumption per task. One challenge is that the generated state machines might contain <em>self-referential</em> structs, for example when local variables of the asynchronous function reference each other. To prevent pointer invalidation, Rust uses the <code>Pin</code> type to ensure that futures cannot be moved in memory anymore after they have been polled for the first time.</p> <p>For our <strong>implementation</strong>, we first created a very basic executor that polls all spawned tasks in a busy loop without using the <code>Waker</code> type at all. We then showed the advantage of waker notifications by implementing an asynchronous keyboard task. The task defines a static <code>SCANCODE_QUEUE</code> using the mutex-free <code>ArrayQueue</code> type provided by the <code>crossbeam</code> crate. Instead of handling keypresses directly, the keyboard interrupt handler now puts all received scancodes in the queue and then wakes the registered <code>Waker</code> to signal that new input is available. On the receiving end, we created a <code>ScancodeStream</code> type to provide a <code>Future</code> resolving to the next scancode in the queue. This made it possible to create an asynchronous <code>print_keypresses</code> task that uses async/await to interpret and print the scancodes in the queue.</p> <p>To utilize the waker notifications of the keyboard task, we created a new <code>Executor</code> type that uses an <code>Arc</code>-shared <code>task_queue</code> for ready tasks. We implemented a <code>TaskWaker</code> type that pushes the ID of woken tasks directly to this <code>task_queue</code>, which are then polled again by the executor. To save power when no tasks are runnable, we added support for putting the CPU to sleep using the <code>hlt</code> instruction. Finally, we discussed some potential extensions to our executor, for example, providing multi-core support.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s Next?</h2> <p>Using async/wait, we now have basic support for cooperative multitasking in our kernel. While cooperative multitasking is very efficient, it leads to latency problems when individual tasks keep running for too long, thus preventing other tasks from running. For this reason, it makes sense to also add support for preemptive multitasking to our kernel.</p> <p>In the next post, we will introduce <em>threads</em> as the most common form of preemptive multitasking. In addition to resolving the problem of long-running tasks, threads will also prepare us for utilizing multiple CPU cores and running untrusted user programs in the future.</p> Updates in February 2020 Mon, 02 Mar 2020 00:00:00 +0000 https://os.phil-opp.com/status-update/2020-03-02/ https://os.phil-opp.com/status-update/2020-03-02/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the corresponding libraries and tools.</p> <h2 id="blog-os"><code>blog_os</code></h2> <p>The repository of the <em>Writing an OS in Rust</em> blog received the following updates:</p> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/722">Mention potential bump allocator extensions</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/738">Don’t panic on overflow in allocator; return null pointer instead</a> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/739">Update Allocator Designs post to signal OOM instead of panicking on overflow</a></li> </ul> </li> <li><a href="https://github.com/phil-opp/blog_os/pull/747">Update to Zola 0.10</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/692">Experimental Support for Community Translations</a> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/694">Add translations from rustcc/writing-an-os-in-rust</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/748">Some fixes to generated translations</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/749">Add metadata to translations and list translators</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/752">Add a language selector for browser-supported languages</a></li> </ul> </li> <li><a href="https://github.com/phil-opp/blog_os/pull/751">Use zola check to check for dead links; fix all dead links found</a></li> <li><a href="https://github.com/phil-opp/blog_os/commit/0619f3a9e766c575ba1a4f2c6825049c177f8c70">Convert all external links to https (if supported)</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/732">Mention in “Paging Introduction” that a CPU with 5-level paging is available now</a></li> <li><a href="https://github.com/phil-opp/blog_os/commit/b532c052add9d3eac18663f1836bc9eee11007af">Double Faults: A missing handler leads to a #GP exception (not a #NP)</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/756">Updated pc-keyboard to <code>0.5.0</code></a> by <a href="https://github.com/RKennedy9064">@RKennedy9064</a></li> </ul> <h2 id="x86-64"><code>x86_64</code></h2> <p>The <code>x86_64</code> crate provides support for CPU-specific instructions, registers, and data structures of the <code>x86_64</code> architecture. There were lots of great contributions this month:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/119">Add User Mode registers</a> by <a href="https://github.com/vinaychandra">@vinaychandra</a> <span class="gray">(released together with <a href="https://github.com/rust-osdev/x86_64/pull/118">#118</a> as v0.9.0)</span></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/122">Improve PageTableIndex and PageOffset</a> by <a href="https://github.com/m-ou-se">@m-ou-se</a> <span class="gray">(released as v0.9.1)</span></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/124">Remove the <code>cast</code> dependency</a> by <a href="https://github.com/m-ou-se">@m-ou-se</a> <span class="gray">(released as v0.9.2)</span></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/126">Fix GitHub actions to run latest available rustfmt</a> by <a href="https://github.com/m-ou-se">@m-ou-se</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/127">Enable usage with non-nightly rust</a> by <a href="https://github.com/haraldh">@haraldh</a> <span class="gray">(released as v0.9.3)</span> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/128">asm: add target_env = “musl” to pickup the underscore asm names</a> by <a href="https://github.com/haraldh">@haraldh</a> <span class="gray">(released as v0.9.4)</span></li> </ul> </li> <li><a href="https://github.com/rust-osdev/x86_64/pull/129">Add <code>#[inline]</code> attribute to small functions</a> by <a href="https://github.com/AntoineSebert">@AntoineSebert</a> <span class="gray">(released as v0.9.5)</span></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/130">Fix clippy warnings</a> by <a href="https://github.com/AntoineSebert">@AntoineSebert</a> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/132">Resolve remaining clippy warnings and add clippy job to CI</a></li> </ul> </li> </ul> <h2 id="bootloader"><code>bootloader</code></h2> <p>The bootloader crate received two small bugfixes and one new feature this month:</p> <ul> <li><a href="https://github.com/rust-osdev/bootloader/pull/94">Objcopy replaces <code>.</code> chars with <code>_</code> chars</a> <span class="gray">(released as v0.8.6)</span></li> <li><a href="https://github.com/rust-osdev/bootloader/commit/af4f1016aa19fec3271226f8bfc2145521cf0c98">Fix docs.rs build by specifying an explicit target</a> <span class="gray">(released as v0.8.7)</span></li> <li><a href="https://github.com/rust-osdev/bootloader/pull/96">Add basic support for ELF thread local storage segments</a> <span class="gray">(released as v0.8.8)</span></li> </ul> <h2 id="bootimage"><code>bootimage</code></h2> <p>There were no updates to the <code>bootimage</code> tool this month.</p> <h2 id="cargo-xbuild"><code>cargo-xbuild</code></h2> <p>The <code>cargo-xbuild</code> crate provides support for cross-compiling <code>libcore</code> and <code>liballoc</code>. It received the following contributions this month:</p> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/56">Added new option to the configuration table</a> by <a href="https://github.com/rust-osdev/cargo-xbuild/pull/56">@parraman</a> <span class="gray">(released an v0.5.22)</span></li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/57">Pick up xbuild config from workspace manifest</a> by <a href="https://github.com/ascjones">@ascjones</a> <span class="gray">(released as v0.5.23)</span></li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/59">Make <code>fn build</code> and <code>Args</code> public to enable use as lib</a> by <a href="https://github.com/ascjones">@ascjones</a> <span class="gray">(released as v0.5.24)</span></li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/61">Fix: Not all projects have a root package</a> <span class="gray">(released as v0.5.25)</span></li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/62">Improvements to args and config for lib usage</a> by <a href="https://github.com/ascjones">@ascjones</a> <span class="gray">(released as v0.5.26)</span></li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/64">Add <code>cargo xfix</code> command</a> by <a href="https://github.com/tjhu">@tjhu</a> <span class="gray">(released as v0.5.27)</span></li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/65">Update dependencies</a> by <a href="https://github.com/parasyte">@parasyte</a> <span class="gray">(released as v0.5.28)</span></li> </ul> <h2 id="uart-16550"><code>uart_16550</code></h2> <p>The <code>uart_16550</code> crate, which provides basic support for uart_16550 serial output, received the following updates:</p> <ul> <li><a href="https://github.com/rust-osdev/uart_16550/pull/6">Switch CI to GitHub Actions</a></li> <li><a href="https://github.com/rust-osdev/uart_16550/pull/5">Cargo.toml: update x86_64 dependency</a> by <a href="https://github.com/haraldh">@haraldh</a> <span class="gray">(released as v0.2.3)</span></li> <li><a href="https://github.com/rust-osdev/uart_16550/pull/7">Enable usage with non-nightly rust</a> by <a href="https://github.com/haraldh">@haraldh</a> <span class="gray">(released as v0.2.4)</span></li> </ul> <h2 id="multiboot2-elf64"><code>multiboot2-elf64</code></h2> <p>The <code>multiboot2-elf64</code> crate provides abstractions for reading the boot information of the multiboot 2 standard, which is implemented by bootloaders like GRUB. There were two updates to the crate in February:</p> <ul> <li><a href="https://github.com/rust-osdev/multiboot2-elf64/pull/61">Add MemoryAreaType, to allow users to access memory area types in a type-safe way</a> by <a href="https://github.com/CWood1">@CWood1</a></li> <li><a href="https://github.com/rust-osdev/multiboot2-elf64/pull/62">Add some basic documentation</a> by <a href="https://github.com/rust-osdev/multiboot2-elf64/pull/62">@mental32</a> <span class="gray">(released as v0.8.2)</span></li> </ul> Updates in January 2020 Sat, 01 Feb 2020 00:00:00 +0000 https://os.phil-opp.com/status-update/2020-02-01/ https://os.phil-opp.com/status-update/2020-02-01/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the corresponding libraries and tools.</p> <h2 id="blog-os"><code>blog_os</code></h2> <p>The repository of the <em>Writing an OS in Rust</em> blog received the following updates:</p> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/714">Move #[global_allocator] into allocator module</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/716">Update many_boxes test to scale with heap size</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/719">New post about allocator designs</a> 🎉</li> <li><a href="https://github.com/phil-opp/blog_os/pull/721">Provide multiple implementations of align_up and mention performance</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/725">Refactor Simplified Chinese translation of post 3</a> by <a href="https://github.com/Rustin-Liu">@Rustin-Liu</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/726">Use checked addition for allocator implementations</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/728">Fix dummy allocator code example</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/729">Some style updates to the front page</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/733">Mark active item in table of contents</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/734">Make active section link more discreet</a> by <a href="https://github.com/Menschenkindlein">@Menschenkindlein</a></li> </ul> <p>I also started working on the upcoming post about threads.</p> <h2 id="bootloader"><code>bootloader</code></h2> <p>The bootloader crate received two minor updates this month:</p> <ul> <li><a href="https://github.com/rust-osdev/bootloader/pull/91">Move architecture checks from build script into lib.rs</a></li> <li><a href="https://github.com/rust-osdev/bootloader/pull/92">Update x86_64 dependency to version 0.8.3</a> by <a href="https://github.com/vinaychandra">@vinaychandra</a></li> </ul> <p>Since I focused my time on the new <em>Allocator Designs</em> post, I did not have the time to make more progress on my plan to rewrite the 16-bit/32-bit stages of the bootloader in Rust. I hope to get back to it soon.</p> <h2 id="bootimage"><code>bootimage</code></h2> <p>There were no updates to the <code>bootimage</code> tool this month.</p> <h2 id="x86-64"><code>x86_64</code></h2> <p>The following changes were merged this month:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/115">Allow immediate port version of in/out instructions</a> by <a href="https://github.com/m-ou-se">@m-ou-se</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/116">Make more functions const</a> by <a href="https://github.com/m-ou-se">@m-ou-se</a> <ul> <li>Released as version 0.8.3</li> </ul> </li> <li><a href="https://github.com/rust-osdev/x86_64/pull/118">Return the UnusedPhysFrame on MapToError::PageAlreadyMapped</a> by <a href="https://github.com/haraldh">@haraldh</a> <ul> <li>This is a <strong>breaking change</strong> since it changes the signature of a type.</li> <li>No new release was published yet to give us the option to bundle it with other breaking changes.</li> </ul> </li> </ul> <p>There are also some pull requests that have some open design questions and are still being discussed:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/114">Add p23_insert_flag_mask argument to mapper.map_to()</a> by <a href="https://github.com/haraldh">@haraldh</a> <ul> <li>Related proposal: <a href="https://github.com/rust-osdev/x86_64/issues/121">Page Table Visitors</a> by <a href="https://github.com/mark-i-m">@mark-i-m</a></li> </ul> </li> <li><a href="https://github.com/rust-osdev/x86_64/pull/119">Add User Mode registers</a> by <a href="https://github.com/vinaychandra">@vinaychandra</a></li> </ul> <p>Please feel free to join these discussions if you have opinions on the matter.</p> <h2 id="cargo-xbuild"><code>cargo-xbuild</code></h2> <p>The <code>cargo-xbuild</code> crate, which cross-compiles the sysroot, received the following updates this month:</p> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/52">Override target path for building sysroot</a> by <a href="https://github.com/upsuper">@upsuper</a> <ul> <li>Published as version 0.5.21</li> </ul> </li> </ul> <h2 id="uart-16550"><code>uart_16550</code></h2> <p>The <code>uart_16550</code> crate, which provides basic support for uart_16550 serial output, received a small dependency update:</p> <ul> <li><a href="https://github.com/rust-osdev/uart_16550/pull/4">Update dependency for x86_64</a> by <a href="https://github.com/haraldh">@haraldh</a> <ul> <li>Published as version 0.2.2</li> </ul> </li> </ul> Allocator Designs Mon, 20 Jan 2020 00:00:00 +0000 https://os.phil-opp.com/allocator-designs/ https://os.phil-opp.com/allocator-designs/ <p>This post explains how to implement heap allocators from scratch. It presents and discusses different allocator designs, including bump allocation, linked list allocation, and fixed-size block allocation. For each of the three designs, we will create a basic implementation that can be used for our kernel.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/allocator-designs/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-11"><code>post-11</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="introduction"><a class="zola-anchor" href="#introduction" aria-label="Anchor link for: introduction">🔗</a>Introduction</h2> <p>In the <a href="https://os.phil-opp.com/heap-allocation/">previous post</a>, we added basic support for heap allocations to our kernel. For that, we <a href="https://os.phil-opp.com/heap-allocation/#creating-a-kernel-heap">created a new memory region</a> in the page tables and <a href="https://os.phil-opp.com/heap-allocation/#using-an-allocator-crate">used the <code>linked_list_allocator</code> crate</a> to manage that memory. While we have a working heap now, we left most of the work to the allocator crate without trying to understand how it works.</p> <p>In this post, we will show how to create our own heap allocator from scratch instead of relying on an existing allocator crate. We will discuss different allocator designs, including a simplistic <em>bump allocator</em> and a basic <em>fixed-size block allocator</em>, and use this knowledge to implement an allocator with improved performance (compared to the <code>linked_list_allocator</code> crate).</p> <h3 id="design-goals"><a class="zola-anchor" href="#design-goals" aria-label="Anchor link for: design-goals">🔗</a>Design Goals</h3> <p>The responsibility of an allocator is to manage the available heap memory. It needs to return unused memory on <code>alloc</code> calls and keep track of memory freed by <code>dealloc</code> so that it can be reused again. Most importantly, it must never hand out memory that is already in use somewhere else because this would cause undefined behavior.</p> <p>Apart from correctness, there are many secondary design goals. For example, the allocator should effectively utilize the available memory and keep <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)"><em>fragmentation</em></a> low. Furthermore, it should work well for concurrent applications and scale to any number of processors. For maximal performance, it could even optimize the memory layout with respect to the CPU caches to improve <a href="https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/">cache locality</a> and avoid <a href="https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html">false sharing</a>.</p> <p>These requirements can make good allocators very complex. For example, <a href="http://jemalloc.net/">jemalloc</a> has over 30.000 lines of code. This complexity is often undesired in kernel code, where a single bug can lead to severe security vulnerabilities. Fortunately, the allocation patterns of kernel code are often much simpler compared to userspace code, so that relatively simple allocator designs often suffice.</p> <p>In the following, we present three possible kernel allocator designs and explain their advantages and drawbacks.</p> <h2 id="bump-allocator"><a class="zola-anchor" href="#bump-allocator" aria-label="Anchor link for: bump-allocator">🔗</a>Bump Allocator</h2> <p>The most simple allocator design is a <em>bump allocator</em> (also known as <em>stack allocator</em>). It allocates memory linearly and only keeps track of the number of allocated bytes and the number of allocations. It is only useful in very specific use cases because it has a severe limitation: it can only free all memory at once.</p> <h3 id="idea"><a class="zola-anchor" href="#idea" aria-label="Anchor link for: idea">🔗</a>Idea</h3> <p>The idea behind a bump allocator is to linearly allocate memory by increasing (<em>“bumping”</em>) a <code>next</code> variable, which points to the start of the unused memory. At the beginning, <code>next</code> is equal to the start address of the heap. On each allocation, <code>next</code> is increased by the allocation size so that it always points to the boundary between used and unused memory:</p> <p><img src="https://os.phil-opp.com/allocator-designs/bump-allocation.svg" alt="The heap memory area at three points in time: 1: A single allocation exists at the start of the heap; the next pointer points to its end. 2: A second allocation was added right after the first; the next pointer points to the end of the second allocation. 3: A third allocation was added right after the second one; the next pointer points to the end of the third allocation." /></p> <p>The <code>next</code> pointer only moves in a single direction and thus never hands out the same memory region twice. When it reaches the end of the heap, no more memory can be allocated, resulting in an out-of-memory error on the next allocation.</p> <p>A bump allocator is often implemented with an allocation counter, which is increased by 1 on each <code>alloc</code> call and decreased by 1 on each <code>dealloc</code> call. When the allocation counter reaches zero, it means that all allocations on the heap have been deallocated. In this case, the <code>next</code> pointer can be reset to the start address of the heap, so that the complete heap memory is available for allocations again.</p> <h3 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h3> <p>We start our implementation by declaring a new <code>allocator::bump</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>bump; </span></code></pre> <p>The content of the submodule lives in a new <code>src/allocator/bump.rs</code> file, which we create with the following content:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/bump.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>BumpAllocator { </span><span> heap_start: </span><span style="color:#569cd6;">usize</span><span>, </span><span> heap_end: </span><span style="color:#569cd6;">usize</span><span>, </span><span> next: </span><span style="color:#569cd6;">usize</span><span>, </span><span> allocations: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>BumpAllocator { </span><span> </span><span style="color:#608b4e;">/// Creates a new empty bump allocator. </span><span> </span><span style="color:#569cd6;">pub const fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> BumpAllocator { </span><span> heap_start: </span><span style="color:#b5cea8;">0</span><span>, </span><span> heap_end: </span><span style="color:#b5cea8;">0</span><span>, </span><span> next: </span><span style="color:#b5cea8;">0</span><span>, </span><span> allocations: </span><span style="color:#b5cea8;">0</span><span>, </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">/// Initializes the bump allocator with the given heap bounds. </span><span> </span><span style="color:#608b4e;">/// </span><span> </span><span style="color:#608b4e;">/// This method is unsafe because the caller must ensure that the given </span><span> </span><span style="color:#608b4e;">/// memory range is unused. Also, this method must be called only once. </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>init(</span><span style="color:#569cd6;">&amp;mut </span><span>self, heap_start: </span><span style="color:#569cd6;">usize</span><span>, heap_size: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> self.heap_start = heap_start; </span><span> self.heap_end = heap_start + heap_size; </span><span> self.next = heap_start; </span><span> } </span><span>} </span></code></pre> <p>The <code>heap_start</code> and <code>heap_end</code> fields keep track of the lower and upper bounds of the heap memory region. The caller needs to ensure that these addresses are valid, otherwise the allocator would return invalid memory. For this reason, the <code>init</code> function needs to be <code>unsafe</code> to call.</p> <p>The purpose of the <code>next</code> field is to always point to the first unused byte of the heap, i.e., the start address of the next allocation. It is set to <code>heap_start</code> in the <code>init</code> function because at the beginning, the entire heap is unused. On each allocation, this field will be increased by the allocation size (<em>“bumped”</em>) to ensure that we don’t return the same memory region twice.</p> <p>The <code>allocations</code> field is a simple counter for the active allocations with the goal of resetting the allocator after the last allocation has been freed. It is initialized with 0.</p> <p>We chose to create a separate <code>init</code> function instead of performing the initialization directly in <code>new</code> in order to keep the interface identical to the allocator provided by the <code>linked_list_allocator</code> crate. This way, the allocators can be switched without additional code changes.</p> <h3 id="implementing-globalalloc"><a class="zola-anchor" href="#implementing-globalalloc" aria-label="Anchor link for: implementing-globalalloc">🔗</a>Implementing <code>GlobalAlloc</code></h3> <p>As <a href="https://os.phil-opp.com/heap-allocation/#the-allocator-interface">explained in the previous post</a>, all heap allocators need to implement the <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html"><code>GlobalAlloc</code></a> trait, which is defined like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub unsafe trait </span><span>GlobalAlloc { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8</span><span>; </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc_zeroed(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span style="color:#569cd6;">... </span><span>} </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>realloc( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>self, </span><span> ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, </span><span> layout: Layout, </span><span> new_size: </span><span style="color:#569cd6;">usize </span><span> ) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span style="color:#569cd6;">... </span><span>} </span><span>} </span></code></pre> <p>Only the <code>alloc</code> and <code>dealloc</code> methods are required; the other two methods have default implementations and can be omitted.</p> <h4 id="first-implementation-attempt"><a class="zola-anchor" href="#first-implementation-attempt" aria-label="Anchor link for: first-implementation-attempt">🔗</a>First Implementation Attempt</h4> <p>Let’s try to implement the <code>alloc</code> method for our <code>BumpAllocator</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/bump.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::alloc::{GlobalAlloc, Layout}; </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>GlobalAlloc </span><span style="color:#569cd6;">for </span><span>BumpAllocator { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span> </span><span style="color:#608b4e;">// TODO alignment and bounds check </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_start = self.next; </span><span> self.next = alloc_start + layout.size(); </span><span> self.allocations += </span><span style="color:#b5cea8;">1</span><span>; </span><span> alloc_start </span><span style="color:#569cd6;">as *mut u8 </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, _ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, _layout: Layout) { </span><span> todo!(); </span><span> } </span><span>} </span></code></pre> <p>First, we use the <code>next</code> field as the start address for our allocation. Then we update the <code>next</code> field to point to the end address of the allocation, which is the next unused address on the heap. Before returning the start address of the allocation as a <code>*mut u8</code> pointer, we increase the <code>allocations</code> counter by 1.</p> <p>Note that we don’t perform any bounds checks or alignment adjustments, so this implementation is not safe yet. This does not matter much because it fails to compile anyway with the following error:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0594]: cannot assign to `self.next` which is behind a `&amp;` reference </span><span> --&gt; src/allocator/bump.rs:29:9 </span><span> | </span><span>29 | self.next = alloc_start + layout.size(); </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&amp;` reference, so the data it refers to cannot be written </span></code></pre> <p>(The same error also occurs for the <code>self.allocations += 1</code> line. We omitted it here for brevity.)</p> <p>The error occurs because the <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc"><code>alloc</code></a> and <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc"><code>dealloc</code></a> methods of the <code>GlobalAlloc</code> trait only operate on an immutable <code>&amp;self</code> reference, so updating the <code>next</code> and <code>allocations</code> fields is not possible. This is problematic because updating <code>next</code> on every allocation is the essential principle of a bump allocator.</p> <h4 id="globalalloc-and-mutability"><a class="zola-anchor" href="#globalalloc-and-mutability" aria-label="Anchor link for: globalalloc-and-mutability">🔗</a><code>GlobalAlloc</code> and Mutability</h4> <p>Before we look at a possible solution to this mutability problem, let’s try to understand why the <code>GlobalAlloc</code> trait methods are defined with <code>&amp;self</code> arguments: As we saw <a href="https://os.phil-opp.com/heap-allocation/#the-global-allocator-attribute">in the previous post</a>, the global heap allocator is defined by adding the <code>#[global_allocator]</code> attribute to a <code>static</code> that implements the <code>GlobalAlloc</code> trait. Static variables are immutable in Rust, so there is no way to call a method that takes <code>&amp;mut self</code> on the static allocator. For this reason, all the methods of <code>GlobalAlloc</code> only take an immutable <code>&amp;self</code> reference.</p> <p>Fortunately, there is a way to get a <code>&amp;mut self</code> reference from a <code>&amp;self</code> reference: We can use synchronized <a href="https://doc.rust-lang.org/book/ch15-05-interior-mutability.html">interior mutability</a> by wrapping the allocator in a <a href="https://docs.rs/spin/0.5.0/spin/struct.Mutex.html"><code>spin::Mutex</code></a> spinlock. This type provides a <code>lock</code> method that performs <a href="https://en.wikipedia.org/wiki/Mutual_exclusion">mutual exclusion</a> and thus safely turns a <code>&amp;self</code> reference to a <code>&amp;mut self</code> reference. We’ve already used the wrapper type multiple times in our kernel, for example for the <a href="https://os.phil-opp.com/vga-text-mode/#spinlocks">VGA text buffer</a>.</p> <h4 id="a-locked-wrapper-type"><a class="zola-anchor" href="#a-locked-wrapper-type" aria-label="Anchor link for: a-locked-wrapper-type">🔗</a>A <code>Locked</code> Wrapper Type</h4> <p>With the help of the <code>spin::Mutex</code> wrapper type, we can implement the <code>GlobalAlloc</code> trait for our bump allocator. The trick is to implement the trait not for the <code>BumpAllocator</code> directly, but for the wrapped <code>spin::Mutex&lt;BumpAllocator&gt;</code> type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">unsafe impl </span><span>GlobalAlloc </span><span style="color:#569cd6;">for </span><span>spin::Mutex&lt;BumpAllocator&gt; {…} </span></code></pre> <p>Unfortunately, this still doesn’t work because the Rust compiler does not permit trait implementations for types defined in other crates:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0117]: only traits defined in the current crate can be implemented for arbitrary types </span><span> --&gt; src/allocator/bump.rs:28:1 </span><span> | </span><span>28 | unsafe impl GlobalAlloc for spin::Mutex&lt;BumpAllocator&gt; { </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- </span><span> | | | </span><span> | | `spin::mutex::Mutex` is not defined in the current crate </span><span> | impl doesn&#39;t use only types from inside the current crate </span><span> | </span><span> = note: define and implement a trait or new type instead </span></code></pre> <p>To fix this, we need to create our own wrapper type around <code>spin::Mutex</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#608b4e;">/// A wrapper around spin::Mutex to permit trait implementations. </span><span style="color:#569cd6;">pub struct </span><span>Locked&lt;A&gt; { </span><span> inner: spin::Mutex&lt;A&gt;, </span><span>} </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;A&gt; Locked&lt;A&gt; { </span><span> </span><span style="color:#569cd6;">pub const fn </span><span>new(inner: A) -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> Locked { </span><span> inner: spin::Mutex::new(inner), </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>lock(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; spin::MutexGuard&lt;A&gt; { </span><span> self.inner.lock() </span><span> } </span><span>} </span></code></pre> <p>The type is a generic wrapper around a <code>spin::Mutex&lt;A&gt;</code>. It imposes no restrictions on the wrapped type <code>A</code>, so it can be used to wrap all kinds of types, not just allocators. It provides a simple <code>new</code> constructor function that wraps a given value. For convenience, it also provides a <code>lock</code> function that calls <code>lock</code> on the wrapped <code>Mutex</code>. Since the <code>Locked</code> type is general enough to be useful for other allocator implementations too, we put it in the parent <code>allocator</code> module.</p> <h4 id="implementation-for-locked-bumpallocator"><a class="zola-anchor" href="#implementation-for-locked-bumpallocator" aria-label="Anchor link for: implementation-for-locked-bumpallocator">🔗</a>Implementation for <code>Locked&lt;BumpAllocator&gt;</code></h4> <p>The <code>Locked</code> type is defined in our own crate (in contrast to <code>spin::Mutex</code>), so we can use it to implement <code>GlobalAlloc</code> for our bump allocator. The full implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/bump.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::{align_up, Locked}; </span><span style="color:#569cd6;">use </span><span>alloc::alloc::{GlobalAlloc, Layout}; </span><span style="color:#569cd6;">use </span><span>core::ptr; </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>GlobalAlloc </span><span style="color:#569cd6;">for </span><span>Locked&lt;BumpAllocator&gt; { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> bump = self.lock(); </span><span style="color:#608b4e;">// get a mutable reference </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_start = align_up(bump.next, layout.align()); </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_end = </span><span style="color:#569cd6;">match</span><span> alloc_start.checked_add(layout.size()) { </span><span> Some(end) </span><span style="color:#569cd6;">=&gt;</span><span> end, </span><span> None </span><span style="color:#569cd6;">=&gt; return </span><span>ptr::null_mut(), </span><span> }; </span><span> </span><span> </span><span style="color:#569cd6;">if</span><span> alloc_end &gt; bump.heap_end { </span><span> ptr::null_mut() </span><span style="color:#608b4e;">// out of memory </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> bump.next = alloc_end; </span><span> bump.allocations += </span><span style="color:#b5cea8;">1</span><span>; </span><span> alloc_start </span><span style="color:#569cd6;">as *mut u8 </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, _ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, _layout: Layout) { </span><span> </span><span style="color:#569cd6;">let mut</span><span> bump = self.lock(); </span><span style="color:#608b4e;">// get a mutable reference </span><span> </span><span> bump.allocations -= </span><span style="color:#b5cea8;">1</span><span>; </span><span> </span><span style="color:#569cd6;">if</span><span> bump.allocations == </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> bump.next = bump.heap_start; </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The first step for both <code>alloc</code> and <code>dealloc</code> is to call the <a href="https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock"><code>Mutex::lock</code></a> method through the <code>inner</code> field to get a mutable reference to the wrapped allocator type. The instance remains locked until the end of the method, so that no data race can occur in multithreaded contexts (we will add threading support soon).</p> <p>Compared to the previous prototype, the <code>alloc</code> implementation now respects alignment requirements and performs a bounds check to ensure that the allocations stay inside the heap memory region. The first step is to round up the <code>next</code> address to the alignment specified by the <code>Layout</code> argument. The code for the <code>align_up</code> function is shown in a moment. We then add the requested allocation size to <code>alloc_start</code> to get the end address of the allocation. To prevent integer overflow on large allocations, we use the <a href="https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add"><code>checked_add</code></a> method. If an overflow occurs or if the resulting end address of the allocation is larger than the end address of the heap, we return a null pointer to signal an out-of-memory situation. Otherwise, we update the <code>next</code> address and increase the <code>allocations</code> counter by 1 like before. Finally, we return the <code>alloc_start</code> address converted to a <code>*mut u8</code> pointer.</p> <p>The <code>dealloc</code> function ignores the given pointer and <code>Layout</code> arguments. Instead, it just decreases the <code>allocations</code> counter. If the counter reaches <code>0</code> again, it means that all allocations were freed again. In this case, it resets the <code>next</code> address to the <code>heap_start</code> address to make the complete heap memory available again.</p> <h4 id="address-alignment"><a class="zola-anchor" href="#address-alignment" aria-label="Anchor link for: address-alignment">🔗</a>Address Alignment</h4> <p>The <code>align_up</code> function is general enough that we can put it into the parent <code>allocator</code> module. A basic implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#608b4e;">/// Align the given address `addr` upwards to alignment `align`. </span><span style="color:#569cd6;">fn </span><span>align_up(addr: </span><span style="color:#569cd6;">usize</span><span>, align: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> remainder = addr % align; </span><span> </span><span style="color:#569cd6;">if</span><span> remainder == </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> addr </span><span style="color:#608b4e;">// addr already aligned </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> addr - remainder + align </span><span> } </span><span>} </span></code></pre> <p>The function first computes the <a href="https://en.wikipedia.org/wiki/Euclidean_division">remainder</a> of the division of <code>addr</code> by <code>align</code>. If the remainder is <code>0</code>, the address is already aligned with the given alignment. Otherwise, we align the address by subtracting the remainder (so that the new remainder is 0) and then adding the alignment (so that the address does not become smaller than the original address).</p> <p>Note that this isn’t the most efficient way to implement this function. A much faster implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">/// Align the given address `addr` upwards to alignment `align`. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// Requires that `align` is a power of two. </span><span style="color:#569cd6;">fn </span><span>align_up(addr: </span><span style="color:#569cd6;">usize</span><span>, align: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> (addr + align - </span><span style="color:#b5cea8;">1</span><span>) </span><span style="color:#569cd6;">&amp; !</span><span>(align - </span><span style="color:#b5cea8;">1</span><span>) </span><span>} </span></code></pre> <p>This method requires <code>align</code> to be a power of two, which can be guaranteed by utilizing the <code>GlobalAlloc</code> trait (and its <a href="https://doc.rust-lang.org/alloc/alloc/struct.Layout.html"><code>Layout</code></a> parameter). This makes it possible to create a <a href="https://en.wikipedia.org/wiki/Mask_(computing)">bitmask</a> to align the address in a very efficient way. To understand how it works, let’s go through it step by step, starting on the right side:</p> <ul> <li>Since <code>align</code> is a power of two, its <a href="https://en.wikipedia.org/wiki/Binary_number#Representation">binary representation</a> has only a single bit set (e.g. <code>0b000100000</code>). This means that <code>align - 1</code> has all the lower bits set (e.g. <code>0b00011111</code>).</li> <li>By creating the <a href="https://en.wikipedia.org/wiki/Bitwise_operation#NOT">bitwise <code>NOT</code></a> through the <code>!</code> operator, we get a number that has all the bits set except for the bits lower than <code>align</code> (e.g. <code>0b…111111111100000</code>).</li> <li>By performing a <a href="https://en.wikipedia.org/wiki/Bitwise_operation#AND">bitwise <code>AND</code></a> on an address and <code>!(align - 1)</code>, we align the address <em>downwards</em>. This works by clearing all the bits that are lower than <code>align</code>.</li> <li>Since we want to align upwards instead of downwards, we increase the <code>addr</code> by <code>align - 1</code> before performing the bitwise <code>AND</code>. This way, already aligned addresses remain the same while non-aligned addresses are rounded to the next alignment boundary.</li> </ul> <p>Which variant you choose is up to you. Both compute the same result, only using different methods.</p> <h3 id="using-it"><a class="zola-anchor" href="#using-it" aria-label="Anchor link for: using-it">🔗</a>Using It</h3> <p>To use the bump allocator instead of the <code>linked_list_allocator</code> crate, we need to update the <code>ALLOCATOR</code> static in <code>allocator.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bump::BumpAllocator; </span><span> </span><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">ALLOCATOR</span><span>: Locked&lt;BumpAllocator&gt; = Locked::new(BumpAllocator::new()); </span></code></pre> <p>Here it becomes important that we declared <code>BumpAllocator::new</code> and <code>Locked::new</code> as <a href="https://doc.rust-lang.org/reference/items/functions.html#const-functions"><code>const</code> functions</a>. If they were normal functions, a compilation error would occur because the initialization expression of a <code>static</code> must be evaluable at compile time.</p> <p>We don’t need to change the <code>ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)</code> call in our <code>init_heap</code> function because the bump allocator provides the same interface as the allocator provided by the <code>linked_list_allocator</code>.</p> <p>Now our kernel uses our bump allocator! Everything should still work, including the <a href="https://os.phil-opp.com/heap-allocation/#adding-a-test"><code>heap_allocation</code> tests</a> that we created in the previous post:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test --test heap_allocation </span><span>[…] </span><span>Running 3 tests </span><span>simple_allocation... [ok] </span><span>large_vec... [ok] </span><span>many_boxes... [ok] </span></code></pre> <h3 id="discussion"><a class="zola-anchor" href="#discussion" aria-label="Anchor link for: discussion">🔗</a>Discussion</h3> <p>The big advantage of bump allocation is that it’s very fast. Compared to other allocator designs (see below) that need to actively look for a fitting memory block and perform various bookkeeping tasks on <code>alloc</code> and <code>dealloc</code>, a bump allocator <a href="https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html">can be optimized</a> to just a few assembly instructions. This makes bump allocators useful for optimizing the allocation performance, for example when creating a <a href="https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/">virtual DOM library</a>.</p> <p>While a bump allocator is seldom used as the global allocator, the principle of bump allocation is often applied in the form of <a href="https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html">arena allocation</a>, which basically batches individual allocations together to improve performance. An example of an arena allocator for Rust is contained in the <a href="https://docs.rs/toolshed/0.8.1/toolshed/index.html"><code>toolshed</code></a> crate.</p> <h4 id="the-drawback-of-a-bump-allocator"><a class="zola-anchor" href="#the-drawback-of-a-bump-allocator" aria-label="Anchor link for: the-drawback-of-a-bump-allocator">🔗</a>The Drawback of a Bump Allocator</h4> <p>The main limitation of a bump allocator is that it can only reuse deallocated memory after all allocations have been freed. This means that a single long-lived allocation suffices to prevent memory reuse. We can see this when we add a variation of the <code>many_boxes</code> test:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/heap_allocation.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>many_boxes_long_lived() { </span><span> </span><span style="color:#569cd6;">let</span><span> long_lived = Box::new(</span><span style="color:#b5cea8;">1</span><span>); </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">HEAP_SIZE </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> x = Box::new(i); </span><span> assert_eq!(*x, i); </span><span> } </span><span> assert_eq!(*long_lived, </span><span style="color:#b5cea8;">1</span><span>); </span><span style="color:#608b4e;">// new </span><span>} </span></code></pre> <p>Like the <code>many_boxes</code> test, this test creates a large number of allocations to provoke an out-of-memory failure if the allocator does not reuse freed memory. Additionally, the test creates a <code>long_lived</code> allocation, which lives for the whole loop execution.</p> <p>When we try to run our new test, we see that it indeed fails:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test --test heap_allocation </span><span>Running 4 tests </span><span>simple_allocation... [ok] </span><span>large_vec... [ok] </span><span>many_boxes... [ok] </span><span>many_boxes_long_lived... [failed] </span><span> </span><span>Error: panicked at &#39;allocation error: Layout { size_: 8, align_: 8 }&#39;, src/lib.rs:86:5 </span></code></pre> <p>Let’s try to understand why this failure occurs in detail: First, the <code>long_lived</code> allocation is created at the start of the heap, thereby increasing the <code>allocations</code> counter by 1. For each iteration of the loop, a short-lived allocation is created and directly freed again before the next iteration starts. This means that the <code>allocations</code> counter is temporarily increased to 2 at the beginning of an iteration and decreased to 1 at the end of it. The problem now is that the bump allocator can only reuse memory after <em>all</em> allocations have been freed, i.e., when the <code>allocations</code> counter falls to 0. Since this doesn’t happen before the end of the loop, each loop iteration allocates a new region of memory, leading to an out-of-memory error after a number of iterations.</p> <h4 id="fixing-the-test"><a class="zola-anchor" href="#fixing-the-test" aria-label="Anchor link for: fixing-the-test">🔗</a>Fixing the Test?</h4> <p>There are two potential tricks that we could utilize to fix the test for our bump allocator:</p> <ul> <li>We could update <code>dealloc</code> to check whether the freed allocation was the last allocation returned by <code>alloc</code> by comparing its end address with the <code>next</code> pointer. In case they’re equal, we can safely reset <code>next</code> back to the start address of the freed allocation. This way, each loop iteration reuses the same memory block.</li> <li>We could add an <code>alloc_back</code> method that allocates memory from the <em>end</em> of the heap using an additional <code>next_back</code> field. Then we could manually use this allocation method for all long-lived allocations, thereby separating short-lived and long-lived allocations on the heap. Note that this separation only works if it’s clear beforehand how long each allocation will live. Another drawback of this approach is that manually performing allocations is cumbersome and potentially unsafe.</li> </ul> <p>While both of these approaches work to fix the test, they are not a general solution since they are only able to reuse memory in very specific cases. The question is: Is there a general solution that reuses <em>all</em> freed memory?</p> <h4 id="reusing-all-freed-memory"><a class="zola-anchor" href="#reusing-all-freed-memory" aria-label="Anchor link for: reusing-all-freed-memory">🔗</a>Reusing All Freed Memory?</h4> <p>As we learned <a href="https://os.phil-opp.com/heap-allocation/#dynamic-memory">in the previous post</a>, allocations can live arbitrarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example:</p> <p><img src="https://os.phil-opp.com/allocator-designs/allocation-fragmentation.svg" alt="" /></p> <p>The graphic shows the heap over the course of time. At the beginning, the complete heap is unused, and the <code>next</code> address is equal to <code>heap_start</code> (line 1). Then the first allocation occurs (line 2). In line 3, a second memory block is allocated and the first allocation is freed. Many more allocations are added in line 4. Half of them are very short-lived and already get freed in line 5, where another new allocation is also added.</p> <p>Line 5 shows the fundamental problem: We have five unused memory regions with different sizes, but the <code>next</code> pointer can only point to the beginning of the last region. While we could store the start addresses and sizes of the other unused memory regions in an array of size 4 for this example, this isn’t a general solution since we could easily create an example with 8, 16, or 1000 unused memory regions.</p> <p>Normally, when we have a potentially unbounded number of items, we can just use a heap-allocated collection. This isn’t really possible in our case, since the heap allocator can’t depend on itself (it would cause endless recursion or deadlocks). So we need to find a different solution.</p> <h2 id="linked-list-allocator"><a class="zola-anchor" href="#linked-list-allocator" aria-label="Anchor link for: linked-list-allocator">🔗</a>Linked List Allocator</h2> <p>A common trick to keep track of an arbitrary number of free memory areas when implementing allocators is to use these areas themselves as backing storage. This utilizes the fact that the regions are still mapped to a virtual address and backed by a physical frame, but the stored information is not needed anymore. By storing the information about the freed region in the region itself, we can keep track of an unbounded number of freed regions without needing additional memory.</p> <p>The most common implementation approach is to construct a single linked list in the freed memory, with each node being a freed memory region:</p> <p><img src="https://os.phil-opp.com/allocator-designs/linked-list-allocation.svg" alt="" /></p> <p>Each list node contains two fields: the size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called <code>head</code>) to keep track of all unused regions, regardless of their number. The resulting data structure is often called a <a href="https://en.wikipedia.org/wiki/Free_list"><em>free list</em></a>.</p> <p>As you might guess from the name, this is the technique that the <code>linked_list_allocator</code> crate uses. Allocators that use this technique are also often called <em>pool allocators</em>.</p> <h3 id="implementation-1"><a class="zola-anchor" href="#implementation-1" aria-label="Anchor link for: implementation-1">🔗</a>Implementation</h3> <p>In the following, we will create our own simple <code>LinkedListAllocator</code> type that uses the above approach for keeping track of freed memory regions. This part of the post isn’t required for future posts, so you can skip the implementation details if you like.</p> <h4 id="the-allocator-type"><a class="zola-anchor" href="#the-allocator-type" aria-label="Anchor link for: the-allocator-type">🔗</a>The Allocator Type</h4> <p>We start by creating a private <code>ListNode</code> struct in a new <code>allocator::linked_list</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>linked_list; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">struct </span><span>ListNode { </span><span> size: </span><span style="color:#569cd6;">usize</span><span>, </span><span> next: Option&lt;</span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> ListNode&gt;, </span><span>} </span></code></pre> <p>Like in the graphic, a list node has a <code>size</code> field and an optional pointer to the next node, represented by the <code>Option&lt;&amp;'static mut ListNode&gt;</code> type. The <code>&amp;'static mut</code> type semantically describes an <a href="https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html">owned</a> object behind a pointer. Basically, it’s a <a href="https://doc.rust-lang.org/alloc/boxed/index.html"><code>Box</code></a> without a destructor that frees the object at the end of the scope.</p> <p>We implement the following set of methods for <code>ListNode</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>ListNode { </span><span> </span><span style="color:#569cd6;">const fn </span><span>new(size: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> ListNode { size, next: None } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>start_addr(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> self </span><span style="color:#569cd6;">as *const Self as usize </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>end_addr(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> self.start_addr() + self.size </span><span> } </span><span>} </span></code></pre> <p>The type has a simple constructor function named <code>new</code> and methods to calculate the start and end addresses of the represented region. We make the <code>new</code> function a <a href="https://doc.rust-lang.org/reference/items/functions.html#const-functions">const function</a>, which will be required later when constructing a static linked list allocator. Note that any use of mutable references in const functions (including setting the <code>next</code> field to <code>None</code>) is still unstable. In order to get it to compile, we need to add <strong><code>#![feature(const_mut_refs)]</code></strong> to the beginning of our <code>lib.rs</code>.</p> <p>With the <code>ListNode</code> struct as a building block, we can now create the <code>LinkedListAllocator</code> struct:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>LinkedListAllocator { </span><span> head: ListNode, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>LinkedListAllocator { </span><span> </span><span style="color:#608b4e;">/// Creates an empty LinkedListAllocator. </span><span> </span><span style="color:#569cd6;">pub const fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#569cd6;">Self </span><span>{ </span><span> head: ListNode::new(</span><span style="color:#b5cea8;">0</span><span>), </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">/// Initialize the allocator with the given heap bounds. </span><span> </span><span style="color:#608b4e;">/// </span><span> </span><span style="color:#608b4e;">/// This function is unsafe because the caller must guarantee that the given </span><span> </span><span style="color:#608b4e;">/// heap bounds are valid and that the heap is unused. This method must be </span><span> </span><span style="color:#608b4e;">/// called only once. </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>init(</span><span style="color:#569cd6;">&amp;mut </span><span>self, heap_start: </span><span style="color:#569cd6;">usize</span><span>, heap_size: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> self.add_free_region(heap_start, heap_size); </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">/// Adds the given memory region to the front of the list. </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>add_free_region(</span><span style="color:#569cd6;">&amp;mut </span><span>self, addr: </span><span style="color:#569cd6;">usize</span><span>, size: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> todo!(); </span><span> } </span><span>} </span></code></pre> <p>The struct contains a <code>head</code> node that points to the first heap region. We are only interested in the value of the <code>next</code> pointer, so we set the <code>size</code> to 0 in the <code>ListNode::new</code> function. Making <code>head</code> a <code>ListNode</code> instead of just a <code>&amp;'static mut ListNode</code> has the advantage that the implementation of the <code>alloc</code> method will be simpler.</p> <p>Like for the bump allocator, the <code>new</code> function doesn’t initialize the allocator with the heap bounds. In addition to maintaining API compatibility, the reason is that the initialization routine requires writing a node to the heap memory, which can only happen at runtime. The <code>new</code> function, however, needs to be a <a href="https://doc.rust-lang.org/reference/items/functions.html#const-functions"><code>const</code> function</a> that can be evaluated at compile time because it will be used for initializing the <code>ALLOCATOR</code> static. For this reason, we again provide a separate, non-constant <code>init</code> method.</p> <p>The <code>init</code> method uses an <code>add_free_region</code> method, whose implementation will be shown in a moment. For now, we use the <a href="https://doc.rust-lang.org/core/macro.todo.html"><code>todo!</code></a> macro to provide a placeholder implementation that always panics.</p> <h4 id="the-add-free-region-method"><a class="zola-anchor" href="#the-add-free-region-method" aria-label="Anchor link for: the-add-free-region-method">🔗</a>The <code>add_free_region</code> Method</h4> <p>The <code>add_free_region</code> method provides the fundamental <em>push</em> operation on the linked list. We currently only call this method from <code>init</code>, but it will also be the central method in our <code>dealloc</code> implementation. Remember, the <code>dealloc</code> method is called when an allocated memory region is freed again. To keep track of this freed memory region, we want to push it to the linked list.</p> <p>The implementation of the <code>add_free_region</code> method looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::align_up; </span><span style="color:#569cd6;">use </span><span>core::mem; </span><span> </span><span style="color:#569cd6;">impl </span><span>LinkedListAllocator { </span><span> </span><span style="color:#608b4e;">/// Adds the given memory region to the front of the list. </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>add_free_region(</span><span style="color:#569cd6;">&amp;mut </span><span>self, addr: </span><span style="color:#569cd6;">usize</span><span>, size: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#608b4e;">// ensure that the freed region is capable of holding ListNode </span><span> assert_eq!(align_up(addr, mem::align_of::&lt;ListNode&gt;()), addr); </span><span> assert!(size &gt;= mem::size_of::&lt;ListNode&gt;()); </span><span> </span><span> </span><span style="color:#608b4e;">// create a new list node and append it at the start of the list </span><span> </span><span style="color:#569cd6;">let mut</span><span> node = ListNode::new(size); </span><span> node.next = self.head.next.take(); </span><span> </span><span style="color:#569cd6;">let</span><span> node_ptr = addr </span><span style="color:#569cd6;">as *mut</span><span> ListNode; </span><span> node_ptr.write(node); </span><span> self.head.next = Some(</span><span style="color:#569cd6;">&amp;mut </span><span>*node_ptr) </span><span> } </span><span>} </span></code></pre> <p>The method takes the address and size of a memory region as an argument and adds it to the front of the list. First, it ensures that the given region has the necessary size and alignment for storing a <code>ListNode</code>. Then it creates the node and inserts it into the list through the following steps:</p> <p><img src="https://os.phil-opp.com/allocator-designs/linked-list-allocator-push.svg" alt="" /></p> <p>Step 0 shows the state of the heap before <code>add_free_region</code> is called. In step 1, the method is called with the memory region marked as <code>freed</code> in the graphic. After the initial checks, the method creates a new <code>node</code> on its stack with the size of the freed region. It then uses the <a href="https://doc.rust-lang.org/core/option/enum.Option.html#method.take"><code>Option::take</code></a> method to set the <code>next</code> pointer of the node to the current <code>head</code> pointer, thereby resetting the <code>head</code> pointer to <code>None</code>.</p> <p>In step 2, the method writes the newly created <code>node</code> to the beginning of the freed memory region through the <a href="https://doc.rust-lang.org/std/primitive.pointer.html#method.write"><code>write</code></a> method. It then points the <code>head</code> pointer to the new node. The resulting pointer structure looks a bit chaotic because the freed region is always inserted at the beginning of the list, but if we follow the pointers, we see that each free region is still reachable from the <code>head</code> pointer.</p> <h4 id="the-find-region-method"><a class="zola-anchor" href="#the-find-region-method" aria-label="Anchor link for: the-find-region-method">🔗</a>The <code>find_region</code> Method</h4> <p>The second fundamental operation on a linked list is finding an entry and removing it from the list. This is the central operation needed for implementing the <code>alloc</code> method. We implement the operation as a <code>find_region</code> method in the following way:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>LinkedListAllocator { </span><span> </span><span style="color:#608b4e;">/// Looks for a free region with the given size and alignment and removes </span><span> </span><span style="color:#608b4e;">/// it from the list. </span><span> </span><span style="color:#608b4e;">/// </span><span> </span><span style="color:#608b4e;">/// Returns a tuple of the list node and the start address of the allocation. </span><span> </span><span style="color:#569cd6;">fn </span><span>find_region(</span><span style="color:#569cd6;">&amp;mut </span><span>self, size: </span><span style="color:#569cd6;">usize</span><span>, align: </span><span style="color:#569cd6;">usize</span><span>) </span><span> -&gt; Option&lt;(</span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> ListNode, </span><span style="color:#569cd6;">usize</span><span>)&gt; </span><span> { </span><span> </span><span style="color:#608b4e;">// reference to current list node, updated for each iteration </span><span> </span><span style="color:#569cd6;">let mut</span><span> current = </span><span style="color:#569cd6;">&amp;mut </span><span>self.head; </span><span> </span><span style="color:#608b4e;">// look for a large enough memory region in linked list </span><span> </span><span style="color:#569cd6;">while let </span><span>Some(</span><span style="color:#569cd6;">ref mut</span><span> region) = current.next { </span><span> </span><span style="color:#569cd6;">if let </span><span>Ok(alloc_start) = </span><span style="color:#569cd6;">Self</span><span>::alloc_from_region(</span><span style="color:#569cd6;">&amp;</span><span>region, size, align) { </span><span> </span><span style="color:#608b4e;">// region suitable for allocation -&gt; remove node from list </span><span> </span><span style="color:#569cd6;">let</span><span> next = region.next.take(); </span><span> </span><span style="color:#569cd6;">let</span><span> ret = Some((current.next.take().unwrap(), alloc_start)); </span><span> current.next = next; </span><span> </span><span style="color:#569cd6;">return</span><span> ret; </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> </span><span style="color:#608b4e;">// region not suitable -&gt; continue with next region </span><span> current = current.next.as_mut().unwrap(); </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// no suitable region found </span><span> None </span><span> } </span><span>} </span></code></pre> <p>The method uses a <code>current</code> variable and a <a href="https://doc.rust-lang.org/reference/expressions/loop-expr.html#predicate-pattern-loops"><code>while let</code> loop</a> to iterate over the list elements. At the beginning, <code>current</code> is set to the (dummy) <code>head</code> node. On each iteration, it is then updated to the <code>next</code> field of the current node (in the <code>else</code> block). If the region is suitable for an allocation with the given size and alignment, the region is removed from the list and returned together with the <code>alloc_start</code> address.</p> <p>When the <code>current.next</code> pointer becomes <code>None</code>, the loop exits. This means we iterated over the whole list but found no region suitable for an allocation. In that case, we return <code>None</code>. Whether a region is suitable is checked by the <code>alloc_from_region</code> function, whose implementation will be shown in a moment.</p> <p>Let’s take a more detailed look at how a suitable region is removed from the list:</p> <p><img src="https://os.phil-opp.com/allocator-designs/linked-list-allocator-remove-region.svg" alt="" /></p> <p>Step 0 shows the situation before any pointer adjustments. The <code>region</code> and <code>current</code> regions and the <code>region.next</code> and <code>current.next</code> pointers are marked in the graphic. In step 1, both the <code>region.next</code> and <code>current.next</code> pointers are reset to <code>None</code> by using the <a href="https://doc.rust-lang.org/core/option/enum.Option.html#method.take"><code>Option::take</code></a> method. The original pointers are stored in local variables called <code>next</code> and <code>ret</code>.</p> <p>In step 2, the <code>current.next</code> pointer is set to the local <code>next</code> pointer, which is the original <code>region.next</code> pointer. The effect is that <code>current</code> now directly points to the region after <code>region</code>, so that <code>region</code> is no longer an element of the linked list. The function then returns the pointer to <code>region</code> stored in the local <code>ret</code> variable.</p> <h5 id="the-alloc-from-region-function"><a class="zola-anchor" href="#the-alloc-from-region-function" aria-label="Anchor link for: the-alloc-from-region-function">🔗</a>The <code>alloc_from_region</code> Function</h5> <p>The <code>alloc_from_region</code> function returns whether a region is suitable for an allocation with a given size and alignment. It is defined like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>LinkedListAllocator { </span><span> </span><span style="color:#608b4e;">/// Try to use the given region for an allocation with given size and </span><span> </span><span style="color:#608b4e;">/// alignment. </span><span> </span><span style="color:#608b4e;">/// </span><span> </span><span style="color:#608b4e;">/// Returns the allocation start address on success. </span><span> </span><span style="color:#569cd6;">fn </span><span>alloc_from_region(region: </span><span style="color:#569cd6;">&amp;</span><span>ListNode, size: </span><span style="color:#569cd6;">usize</span><span>, align: </span><span style="color:#569cd6;">usize</span><span>) </span><span> -&gt; Result&lt;</span><span style="color:#569cd6;">usize</span><span>, ()&gt; </span><span> { </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_start = align_up(region.start_addr(), align); </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_end = alloc_start.checked_add(size).ok_or(())</span><span style="color:#569cd6;">?</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">if</span><span> alloc_end &gt; region.end_addr() { </span><span> </span><span style="color:#608b4e;">// region too small </span><span> </span><span style="color:#569cd6;">return </span><span>Err(()); </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> excess_size = region.end_addr() - alloc_end; </span><span> </span><span style="color:#569cd6;">if</span><span> excess_size &gt; </span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">&amp;&amp;</span><span> excess_size &lt; mem::size_of::&lt;ListNode&gt;() { </span><span> </span><span style="color:#608b4e;">// rest of region too small to hold a ListNode (required because the </span><span> </span><span style="color:#608b4e;">// allocation splits the region in a used and a free part) </span><span> </span><span style="color:#569cd6;">return </span><span>Err(()); </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// region suitable for allocation </span><span> Ok(alloc_start) </span><span> } </span><span>} </span></code></pre> <p>First, the function calculates the start and end address of a potential allocation, using the <code>align_up</code> function we defined earlier and the <a href="https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add"><code>checked_add</code></a> method. If an overflow occurs or if the end address is behind the end address of the region, the allocation doesn’t fit in the region and we return an error.</p> <p>The function performs a less obvious check after that. This check is necessary because most of the time an allocation does not fit a suitable region perfectly, so that a part of the region remains usable after the allocation. This part of the region must store its own <code>ListNode</code> after the allocation, so it must be large enough to do so. The check verifies exactly that: either the allocation fits perfectly (<code>excess_size == 0</code>) or the excess size is large enough to store a <code>ListNode</code>.</p> <h4 id="implementing-globalalloc-1"><a class="zola-anchor" href="#implementing-globalalloc-1" aria-label="Anchor link for: implementing-globalalloc-1">🔗</a>Implementing <code>GlobalAlloc</code></h4> <p>With the fundamental operations provided by the <code>add_free_region</code> and <code>find_region</code> methods, we can now finally implement the <code>GlobalAlloc</code> trait. As with the bump allocator, we don’t implement the trait directly for the <code>LinkedListAllocator</code> but only for a wrapped <code>Locked&lt;LinkedListAllocator&gt;</code>. The <a href="https://os.phil-opp.com/allocator-designs/#a-locked-wrapper-type"><code>Locked</code> wrapper</a> adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the <code>alloc</code> and <code>dealloc</code> methods only take <code>&amp;self</code> references.</p> <p>The implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::Locked; </span><span style="color:#569cd6;">use </span><span>alloc::alloc::{GlobalAlloc, Layout}; </span><span style="color:#569cd6;">use </span><span>core::ptr; </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>GlobalAlloc </span><span style="color:#569cd6;">for </span><span>Locked&lt;LinkedListAllocator&gt; { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span> </span><span style="color:#608b4e;">// perform layout adjustments </span><span> </span><span style="color:#569cd6;">let </span><span>(size, align) = LinkedListAllocator::size_align(layout); </span><span> </span><span style="color:#569cd6;">let mut</span><span> allocator = self.lock(); </span><span> </span><span> </span><span style="color:#569cd6;">if let </span><span>Some((region, alloc_start)) = allocator.find_region(size, align) { </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_end = alloc_start.checked_add(size).expect(</span><span style="color:#d69d85;">&quot;overflow&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> excess_size = region.end_addr() - alloc_end; </span><span> </span><span style="color:#569cd6;">if</span><span> excess_size &gt; </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> allocator.add_free_region(alloc_end, excess_size); </span><span> } </span><span> alloc_start </span><span style="color:#569cd6;">as *mut u8 </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> ptr::null_mut() </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout) { </span><span> </span><span style="color:#608b4e;">// perform layout adjustments </span><span> </span><span style="color:#569cd6;">let </span><span>(size, </span><span style="color:#569cd6;">_</span><span>) = LinkedListAllocator::size_align(layout); </span><span> </span><span> self.lock().add_free_region(ptr </span><span style="color:#569cd6;">as usize</span><span>, size) </span><span> } </span><span>} </span></code></pre> <p>Let’s start with the <code>dealloc</code> method because it is simpler: First, it performs some layout adjustments, which we will explain in a moment. Then, it retrieves a <code>&amp;mut LinkedListAllocator</code> reference by calling the <a href="https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock"><code>Mutex::lock</code></a> function on the <a href="https://os.phil-opp.com/allocator-designs/#a-locked-wrapper-type"><code>Locked</code> wrapper</a>. Lastly, it calls the <code>add_free_region</code> function to add the deallocated region to the free list.</p> <p>The <code>alloc</code> method is a bit more complex. It starts with the same layout adjustments and also calls the <a href="https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock"><code>Mutex::lock</code></a> function to receive a mutable allocator reference. Then it uses the <code>find_region</code> method to find a suitable memory region for the allocation and remove it from the list. If this doesn’t succeed and <code>None</code> is returned, it returns <code>null_mut</code> to signal an error as there is no suitable memory region.</p> <p>In the success case, the <code>find_region</code> method returns a tuple of the suitable region (no longer in the list) and the start address of the allocation. Using <code>alloc_start</code>, the allocation size, and the end address of the region, it calculates the end address of the allocation and the excess size again. If the excess size is not null, it calls <code>add_free_region</code> to add the excess size of the memory region back to the free list. Finally, it returns the <code>alloc_start</code> address casted as a <code>*mut u8</code> pointer.</p> <h4 id="layout-adjustments"><a class="zola-anchor" href="#layout-adjustments" aria-label="Anchor link for: layout-adjustments">🔗</a>Layout Adjustments</h4> <p>So what are these layout adjustments that we make at the beginning of both <code>alloc</code> and <code>dealloc</code>? They ensure that each allocated block is capable of storing a <code>ListNode</code>. This is important because the memory block is going to be deallocated at some point, where we want to write a <code>ListNode</code> to it. If the block is smaller than a <code>ListNode</code> or does not have the correct alignment, undefined behavior can occur.</p> <p>The layout adjustments are performed by the <code>size_align</code> function, which is defined like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/linked_list.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>LinkedListAllocator { </span><span> </span><span style="color:#608b4e;">/// Adjust the given layout so that the resulting allocated memory </span><span> </span><span style="color:#608b4e;">/// region is also capable of storing a `ListNode`. </span><span> </span><span style="color:#608b4e;">/// </span><span> </span><span style="color:#608b4e;">/// Returns the adjusted size and alignment as a (size, align) tuple. </span><span> </span><span style="color:#569cd6;">fn </span><span>size_align(layout: Layout) -&gt; (</span><span style="color:#569cd6;">usize</span><span>, </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">let</span><span> layout = layout </span><span> .align_to(mem::align_of::&lt;ListNode&gt;()) </span><span> .expect(</span><span style="color:#d69d85;">&quot;adjusting alignment failed&quot;</span><span>) </span><span> .pad_to_align(); </span><span> </span><span style="color:#569cd6;">let</span><span> size = layout.size().max(mem::size_of::&lt;ListNode&gt;()); </span><span> (size, layout.align()) </span><span> } </span><span>} </span></code></pre> <p>First, the function uses the <a href="https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to"><code>align_to</code></a> method on the passed <a href="https://doc.rust-lang.org/alloc/alloc/struct.Layout.html"><code>Layout</code></a> to increase the alignment to the alignment of a <code>ListNode</code> if necessary. It then uses the <a href="https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align"><code>pad_to_align</code></a> method to round up the size to a multiple of the alignment to ensure that the start address of the next memory block will have the correct alignment for storing a <code>ListNode</code> too. In the second step, it uses the <a href="https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max"><code>max</code></a> method to enforce a minimum allocation size of <code>mem::size_of::&lt;ListNode&gt;</code>. This way, the <code>dealloc</code> function can safely write a <code>ListNode</code> to the freed memory block.</p> <h3 id="using-it-1"><a class="zola-anchor" href="#using-it-1" aria-label="Anchor link for: using-it-1">🔗</a>Using it</h3> <p>We can now update the <code>ALLOCATOR</code> static in the <code>allocator</code> module to use our new <code>LinkedListAllocator</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>linked_list::LinkedListAllocator; </span><span> </span><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">ALLOCATOR</span><span>: Locked&lt;LinkedListAllocator&gt; = </span><span> Locked::new(LinkedListAllocator::new()); </span></code></pre> <p>Since the <code>init</code> function behaves the same for the bump and linked list allocators, we don’t need to modify the <code>init</code> call in <code>init_heap</code>.</p> <p>When we now run our <code>heap_allocation</code> tests again, we see that all tests pass now, including the <code>many_boxes_long_lived</code> test that failed with the bump allocator:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test --test heap_allocation </span><span>simple_allocation... [ok] </span><span>large_vec... [ok] </span><span>many_boxes... [ok] </span><span>many_boxes_long_lived... [ok] </span></code></pre> <p>This shows that our linked list allocator is able to reuse freed memory for subsequent allocations.</p> <h3 id="discussion-1"><a class="zola-anchor" href="#discussion-1" aria-label="Anchor link for: discussion-1">🔗</a>Discussion</h3> <p>In contrast to the bump allocator, the linked list allocator is much more suitable as a general-purpose allocator, mainly because it is able to directly reuse freed memory. However, it also has some drawbacks. Some of them are only caused by our basic implementation, but there are also fundamental drawbacks of the allocator design itself.</p> <h4 id="merging-freed-blocks"><a class="zola-anchor" href="#merging-freed-blocks" aria-label="Anchor link for: merging-freed-blocks">🔗</a>Merging Freed Blocks</h4> <p>The main problem with our implementation is that it only splits the heap into smaller blocks but never merges them back together. Consider this example:</p> <p><img src="https://os.phil-opp.com/allocator-designs/linked-list-allocator-fragmentation-on-dealloc.svg" alt="" /></p> <p>In the first line, three allocations are created on the heap. Two of them are freed again in line 2 and the third is freed in line 3. Now the complete heap is unused again, but it is still split into four individual blocks. At this point, a large allocation might not be possible anymore because none of the four blocks is large enough. Over time, the process continues, and the heap is split into smaller and smaller blocks. At some point, the heap is so fragmented that even normal sized allocations will fail.</p> <p>To fix this problem, we need to merge adjacent freed blocks back together. For the above example, this would mean the following:</p> <p><img src="https://os.phil-opp.com/allocator-designs/linked-list-allocator-merge-on-dealloc.svg" alt="" /></p> <p>Like before, two of the three allocations are freed in line <code>2</code>. Instead of keeping the fragmented heap, we now perform an additional step in line <code>2a</code> to merge the two rightmost blocks back together. In line <code>3</code>, the third allocation is freed (like before), resulting in a completely unused heap represented by three distinct blocks. In an additional merging step in line <code>3a</code>, we then merge the three adjacent blocks back together.</p> <p>The <code>linked_list_allocator</code> crate implements this merging strategy in the following way: Instead of inserting freed memory blocks at the beginning of the linked list on <code>deallocate</code>, it always keeps the list sorted by start address. This way, merging can be performed directly on the <code>deallocate</code> call by examining the addresses and sizes of the two neighboring blocks in the list. Of course, the deallocation operation is slower this way, but it prevents the heap fragmentation we saw above.</p> <h4 id="performance"><a class="zola-anchor" href="#performance" aria-label="Anchor link for: performance">🔗</a>Performance</h4> <p>As we learned above, the bump allocator is extremely fast and can be optimized to just a few assembly operations. The linked list allocator performs much worse in this category. The problem is that an allocation request might need to traverse the complete linked list until it finds a suitable block.</p> <p>Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience relatively fast allocation performance. For a program that fragments the heap with many allocations, however, the allocation performance will be very bad because the linked list will be very long and mostly contain very small blocks.</p> <p>It’s worth noting that this performance issue isn’t a problem caused by our basic implementation but a fundamental problem of the linked list approach. Since allocation performance can be very important for kernel-level code, we explore a third allocator design in the following that trades improved performance for reduced memory utilization.</p> <h2 id="fixed-size-block-allocator"><a class="zola-anchor" href="#fixed-size-block-allocator" aria-label="Anchor link for: fixed-size-block-allocator">🔗</a>Fixed-Size Block Allocator</h2> <p>In the following, we present an allocator design that uses fixed-size memory blocks for fulfilling allocation requests. This way, the allocator often returns blocks that are larger than needed for allocations, which results in wasted memory due to <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation">internal fragmentation</a>. On the other hand, it drastically reduces the time required to find a suitable block (compared to the linked list allocator), resulting in much better allocation performance.</p> <h3 id="introduction-1"><a class="zola-anchor" href="#introduction-1" aria-label="Anchor link for: introduction-1">🔗</a>Introduction</h3> <p>The idea behind a <em>fixed-size block allocator</em> is the following: Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512 bytes, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block, and an allocation of 128 bytes a 512-byte block.</p> <p>Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. However, instead of using a single list with different block sizes, we create a separate list for each size class. Each list then only stores blocks of a single size. For example, with block sizes of 16, 64, and 512, there would be three separate linked lists in memory:</p> <p><img src="https://os.phil-opp.com/allocator-designs/fixed-size-block-example.svg" alt="" />.</p> <p>Instead of a single <code>head</code> pointer, we have the three head pointers <code>head_16</code>, <code>head_64</code>, and <code>head_512</code> that each point to the first unused block of the corresponding size. All nodes in a single list have the same size. For example, the list started by the <code>head_16</code> pointer only contains 16-byte blocks. This means that we no longer need to store the size in each list node since it is already specified by the name of the head pointer.</p> <p>Since each element in a list has the same size, each list element is equally suitable for an allocation request. This means that we can very efficiently perform an allocation using the following steps:</p> <ul> <li>Round up the requested allocation size to the next block size. For example, when an allocation of 12 bytes is requested, we would choose the block size of 16 in the above example.</li> <li>Retrieve the head pointer for the list, e.g., for block size 16, we need to use <code>head_16</code>.</li> <li>Remove the first block from the list and return it.</li> </ul> <p>Most notably, we can always return the first element of the list and no longer need to traverse the full list. Thus, allocations are much faster than with the linked list allocator.</p> <h4 id="block-sizes-and-wasted-memory"><a class="zola-anchor" href="#block-sizes-and-wasted-memory" aria-label="Anchor link for: block-sizes-and-wasted-memory">🔗</a>Block Sizes and Wasted Memory</h4> <p>Depending on the block sizes, we lose a lot of memory by rounding up. For example, when a 512-byte block is returned for a 128-byte allocation, three-quarters of the allocated memory is unused. By defining reasonable block sizes, it is possible to limit the amount of wasted memory to some degree. For example, when using the powers of 2 (4, 8, 16, 32, 64, 128, …) as block sizes, we can limit the memory waste to half of the allocation size in the worst case and a quarter of the allocation size in the average case.</p> <p>It is also common to optimize block sizes based on common allocation sizes in a program. For example, we could additionally add block size 24 to improve memory usage for programs that often perform allocations of 24 bytes. This way, the amount of wasted memory can often be reduced without losing the performance benefits.</p> <h4 id="deallocation"><a class="zola-anchor" href="#deallocation" aria-label="Anchor link for: deallocation">🔗</a>Deallocation</h4> <p>Much like allocation, deallocation is also very performant. It involves the following steps:</p> <ul> <li>Round up the freed allocation size to the next block size. This is required since the compiler only passes the requested allocation size to <code>dealloc</code>, not the size of the block that was returned by <code>alloc</code>. By using the same size-adjustment function in both <code>alloc</code> and <code>dealloc</code>, we can make sure that we always free the correct amount of memory.</li> <li>Retrieve the head pointer for the list.</li> <li>Add the freed block to the front of the list by updating the head pointer.</li> </ul> <p>Most notably, no traversal of the list is required for deallocation either. This means that the time required for a <code>dealloc</code> call stays the same regardless of the list length.</p> <h4 id="fallback-allocator"><a class="zola-anchor" href="#fallback-allocator" aria-label="Anchor link for: fallback-allocator">🔗</a>Fallback Allocator</h4> <p>Given that large allocations (&gt;2 KB) are often rare, especially in operating system kernels, it might make sense to fall back to a different allocator for these allocations. For example, we could fall back to a linked list allocator for allocations greater than 2048 bytes in order to reduce memory waste. Since only very few allocations of that size are expected, the linked list would stay small and the (de)allocations would still be reasonably fast.</p> <h4 id="creating-new-blocks"><a class="zola-anchor" href="#creating-new-blocks" aria-label="Anchor link for: creating-new-blocks">🔗</a>Creating new Blocks</h4> <p>Above, we always assumed that there are always enough blocks of a specific size in the list to fulfill all allocation requests. However, at some point, the linked list for a given block size becomes empty. At this point, there are two ways we can create new unused blocks of a specific size to fulfill an allocation request:</p> <ul> <li>Allocate a new block from the fallback allocator (if there is one).</li> <li>Split a larger block from a different list. This best works if block sizes are powers of two. For example, a 32-byte block can be split into two 16-byte blocks.</li> </ul> <p>For our implementation, we will allocate new blocks from the fallback allocator since the implementation is much simpler.</p> <h3 id="implementation-2"><a class="zola-anchor" href="#implementation-2" aria-label="Anchor link for: implementation-2">🔗</a>Implementation</h3> <p>Now that we know how a fixed-size block allocator works, we can start our implementation. We won’t depend on the implementation of the linked list allocator created in the previous section, so you can follow this part even if you skipped the linked list allocator implementation.</p> <h4 id="list-node"><a class="zola-anchor" href="#list-node" aria-label="Anchor link for: list-node">🔗</a>List Node</h4> <p>We start our implementation by creating a <code>ListNode</code> type in a new <code>allocator::fixed_size_block</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>fixed_size_block; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#569cd6;">struct </span><span>ListNode { </span><span> next: Option&lt;</span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> ListNode&gt;, </span><span>} </span></code></pre> <p>This type is similar to the <code>ListNode</code> type of our <a href="https://os.phil-opp.com/allocator-designs/#the-allocator-type">linked list allocator implementation</a>, with the difference that we don’t have a <code>size</code> field. It isn’t needed because every block in a list has the same size with the fixed-size block allocator design.</p> <h4 id="block-sizes"><a class="zola-anchor" href="#block-sizes" aria-label="Anchor link for: block-sizes">🔗</a>Block Sizes</h4> <p>Next, we define a constant <code>BLOCK_SIZES</code> slice with the block sizes used for our implementation:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#608b4e;">/// The block sizes to use. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// The sizes must each be power of 2 because they are also used as </span><span style="color:#608b4e;">/// the block alignment (alignments must be always powers of 2). </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">BLOCK_SIZES</span><span>: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">usize</span><span>] = </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#b5cea8;">8</span><span>, </span><span style="color:#b5cea8;">16</span><span>, </span><span style="color:#b5cea8;">32</span><span>, </span><span style="color:#b5cea8;">64</span><span>, </span><span style="color:#b5cea8;">128</span><span>, </span><span style="color:#b5cea8;">256</span><span>, </span><span style="color:#b5cea8;">512</span><span>, </span><span style="color:#b5cea8;">1024</span><span>, </span><span style="color:#b5cea8;">2048</span><span>]; </span></code></pre> <p>As block sizes, we use powers of 2, starting from 8 up to 2048. We don’t define any block sizes smaller than 8 because each block must be capable of storing a 64-bit pointer to the next block when freed. For allocations greater than 2048 bytes, we will fall back to a linked list allocator.</p> <p>To simplify the implementation, we define the size of a block as its required alignment in memory. So a 16-byte block is always aligned on a 16-byte boundary and a 512-byte block is aligned on a 512-byte boundary. Since alignments always need to be powers of 2, this rules out any other block sizes. If we need block sizes that are not powers of 2 in the future, we can still adjust our implementation for this (e.g., by defining a second <code>BLOCK_ALIGNMENTS</code> array).</p> <h4 id="the-allocator-type-1"><a class="zola-anchor" href="#the-allocator-type-1" aria-label="Anchor link for: the-allocator-type-1">🔗</a>The Allocator Type</h4> <p>Using the <code>ListNode</code> type and the <code>BLOCK_SIZES</code> slice, we can now define our allocator type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>FixedSizeBlockAllocator { </span><span> list_heads: [Option&lt;</span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> ListNode&gt;; BLOCK_SIZES.len()], </span><span> fallback_allocator: linked_list_allocator::Heap, </span><span>} </span></code></pre> <p>The <code>list_heads</code> field is an array of <code>head</code> pointers, one for each block size. This is implemented by using the <code>len()</code> of the <code>BLOCK_SIZES</code> slice as the array length. As a fallback allocator for allocations larger than the largest block size, we use the allocator provided by the <code>linked_list_allocator</code>. We could also use the <code>LinkedListAllocator</code> we implemented ourselves instead, but it has the disadvantage that it does not <a href="https://os.phil-opp.com/allocator-designs/#merging-freed-blocks">merge freed blocks</a>.</p> <p>For constructing a <code>FixedSizeBlockAllocator</code>, we provide the same <code>new</code> and <code>init</code> functions that we implemented for the other allocator types too:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>FixedSizeBlockAllocator { </span><span> </span><span style="color:#608b4e;">/// Creates an empty FixedSizeBlockAllocator. </span><span> </span><span style="color:#569cd6;">pub const fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">EMPTY</span><span>: Option&lt;</span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> ListNode&gt; = None; </span><span> FixedSizeBlockAllocator { </span><span> list_heads: [</span><span style="color:#b4cea8;">EMPTY</span><span>; </span><span style="color:#b4cea8;">BLOCK_SIZES</span><span>.len()], </span><span> fallback_allocator: linked_list_allocator::Heap::empty(), </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">/// Initialize the allocator with the given heap bounds. </span><span> </span><span style="color:#608b4e;">/// </span><span> </span><span style="color:#608b4e;">/// This function is unsafe because the caller must guarantee that the given </span><span> </span><span style="color:#608b4e;">/// heap bounds are valid and that the heap is unused. This method must be </span><span> </span><span style="color:#608b4e;">/// called only once. </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>init(</span><span style="color:#569cd6;">&amp;mut </span><span>self, heap_start: </span><span style="color:#569cd6;">usize</span><span>, heap_size: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> self.fallback_allocator.init(heap_start, heap_size); </span><span> } </span><span>} </span></code></pre> <p>The <code>new</code> function just initializes the <code>list_heads</code> array with empty nodes and creates an <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.empty"><code>empty</code></a> linked list allocator as <code>fallback_allocator</code>. The <code>EMPTY</code> constant is needed to tell the Rust compiler that we want to initialize the array with a constant value. Initializing the array directly as <code>[None; BLOCK_SIZES.len()]</code> does not work, because then the compiler requires <code>Option&lt;&amp;'static mut ListNode&gt;</code> to implement the <code>Copy</code> trait, which it does not. This is a current limitation of the Rust compiler, which might go away in the future.</p> <p>If you haven’t done so already for the <code>LinkedListAllocator</code> implementation, you also need to add <strong><code>#![feature(const_mut_refs)]</code></strong> to the top of your <code>lib.rs</code>. The reason is that any use of mutable reference types in const functions is still unstable, including the <code>Option&lt;&amp;'static mut ListNode&gt;</code> array element type of the <code>list_heads</code> field (even if we set it to <code>None</code>).</p> <p>The unsafe <code>init</code> function only calls the <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init"><code>init</code></a> function of the <code>fallback_allocator</code> without doing any additional initialization of the <code>list_heads</code> array. Instead, we will initialize the lists lazily on <code>alloc</code> and <code>dealloc</code> calls.</p> <p>For convenience, we also create a private <code>fallback_alloc</code> method that allocates using the <code>fallback_allocator</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::alloc::Layout; </span><span style="color:#569cd6;">use </span><span>core::ptr; </span><span> </span><span style="color:#569cd6;">impl </span><span>FixedSizeBlockAllocator { </span><span> </span><span style="color:#608b4e;">/// Allocates using the fallback allocator. </span><span> </span><span style="color:#569cd6;">fn </span><span>fallback_alloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span> </span><span style="color:#569cd6;">match </span><span>self.fallback_allocator.allocate_first_fit(layout) { </span><span> Ok(ptr) </span><span style="color:#569cd6;">=&gt;</span><span> ptr.as_ptr(), </span><span> Err(</span><span style="color:#569cd6;">_</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>ptr::null_mut(), </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html"><code>Heap</code></a> type of the <code>linked_list_allocator</code> crate does not implement <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html"><code>GlobalAlloc</code></a> (as it’s <a href="https://os.phil-opp.com/allocator-designs/#globalalloc-and-mutability">not possible without locking</a>). Instead, it provides an <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.allocate_first_fit"><code>allocate_first_fit</code></a> method that has a slightly different interface. Instead of returning a <code>*mut u8</code> and using a null pointer to signal an error, it returns a <code>Result&lt;NonNull&lt;u8&gt;, ()&gt;</code>. The <a href="https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html"><code>NonNull</code></a> type is an abstraction for a raw pointer that is guaranteed to not be a null pointer. By mapping the <code>Ok</code> case to the <a href="https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr"><code>NonNull::as_ptr</code></a> method and the <code>Err</code> case to a null pointer, we can easily translate this back to a <code>*mut u8</code> type.</p> <h4 id="calculating-the-list-index"><a class="zola-anchor" href="#calculating-the-list-index" aria-label="Anchor link for: calculating-the-list-index">🔗</a>Calculating the List Index</h4> <p>Before we implement the <code>GlobalAlloc</code> trait, we define a <code>list_index</code> helper function that returns the lowest possible block size for a given <a href="https://doc.rust-lang.org/alloc/alloc/struct.Layout.html"><code>Layout</code></a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#608b4e;">/// Choose an appropriate block size for the given layout. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// Returns an index into the `BLOCK_SIZES` array. </span><span style="color:#569cd6;">fn </span><span>list_index(layout: </span><span style="color:#569cd6;">&amp;</span><span>Layout) -&gt; Option&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> required_block_size = layout.size().max(layout.align()); </span><span> </span><span style="color:#b4cea8;">BLOCK_SIZES</span><span>.iter().position(|</span><span style="color:#569cd6;">&amp;</span><span>s| s &gt;= required_block_size) </span><span>} </span></code></pre> <p>The block must have at least the size and alignment required by the given <code>Layout</code>. Since we defined that the block size is also its alignment, this means that the <code>required_block_size</code> is the <a href="https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max">maximum</a> of the layout’s <a href="https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size"><code>size()</code></a> and <a href="https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align"><code>align()</code></a> attributes. To find the next-larger block in the <code>BLOCK_SIZES</code> slice, we first use the <a href="https://doc.rust-lang.org/std/primitive.slice.html#method.iter"><code>iter()</code></a> method to get an iterator and then the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position"><code>position()</code></a> method to find the index of the first block that is at least as large as the <code>required_block_size</code>.</p> <p>Note that we don’t return the block size itself, but the index into the <code>BLOCK_SIZES</code> slice. The reason is that we want to use the returned index as an index into the <code>list_heads</code> array.</p> <h4 id="implementing-globalalloc-2"><a class="zola-anchor" href="#implementing-globalalloc-2" aria-label="Anchor link for: implementing-globalalloc-2">🔗</a>Implementing <code>GlobalAlloc</code></h4> <p>The last step is to implement the <code>GlobalAlloc</code> trait:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::Locked; </span><span style="color:#569cd6;">use </span><span>alloc::alloc::GlobalAlloc; </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>GlobalAlloc </span><span style="color:#569cd6;">for </span><span>Locked&lt;FixedSizeBlockAllocator&gt; { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span> todo!(); </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout) { </span><span> todo!(); </span><span> } </span><span>} </span></code></pre> <p>Like for the other allocators, we don’t implement the <code>GlobalAlloc</code> trait directly for our allocator type, but use the <a href="https://os.phil-opp.com/allocator-designs/#a-locked-wrapper-type"><code>Locked</code> wrapper</a> to add synchronized interior mutability. Since the <code>alloc</code> and <code>dealloc</code> implementations are relatively large, we introduce them one by one in the following.</p> <h5 id="alloc"><a class="zola-anchor" href="#alloc" aria-label="Anchor link for: alloc">🔗</a><code>alloc</code></h5> <p>The implementation of the <code>alloc</code> method looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `impl` block in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> allocator = self.lock(); </span><span> </span><span style="color:#569cd6;">match </span><span>list_index(</span><span style="color:#569cd6;">&amp;</span><span>layout) { </span><span> Some(index) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">match</span><span> allocator.list_heads[index].take() { </span><span> Some(node) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> allocator.list_heads[index] = node.next.take(); </span><span> node </span><span style="color:#569cd6;">as *mut</span><span> ListNode </span><span style="color:#569cd6;">as *mut u8 </span><span> } </span><span> None </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#608b4e;">// no block exists in list =&gt; allocate new block </span><span> </span><span style="color:#569cd6;">let</span><span> block_size = </span><span style="color:#b4cea8;">BLOCK_SIZES</span><span>[index]; </span><span> </span><span style="color:#608b4e;">// only works if all block sizes are a power of 2 </span><span> </span><span style="color:#569cd6;">let</span><span> block_align = block_size; </span><span> </span><span style="color:#569cd6;">let</span><span> layout = Layout::from_size_align(block_size, block_align) </span><span> .unwrap(); </span><span> allocator.fallback_alloc(layout) </span><span> } </span><span> } </span><span> } </span><span> None </span><span style="color:#569cd6;">=&gt;</span><span> allocator.fallback_alloc(layout), </span><span> } </span><span>} </span></code></pre> <p>Let’s go through it step by step:</p> <p>First, we use the <code>Locked::lock</code> method to get a mutable reference to the wrapped allocator instance. Next, we call the <code>list_index</code> function we just defined to calculate the appropriate block size for the given layout and get the corresponding index into the <code>list_heads</code> array. If this index is <code>None</code>, no block size fits for the allocation, therefore we use the <code>fallback_allocator</code> using the <code>fallback_alloc</code> function.</p> <p>If the list index is <code>Some</code>, we try to remove the first node in the corresponding list started by <code>list_heads[index]</code> using the <a href="https://doc.rust-lang.org/core/option/enum.Option.html#method.take"><code>Option::take</code></a> method. If the list is not empty, we enter the <code>Some(node)</code> branch of the <code>match</code> statement, where we point the head pointer of the list to the successor of the popped <code>node</code> (by using <a href="https://doc.rust-lang.org/core/option/enum.Option.html#method.take"><code>take</code></a> again). Finally, we return the popped <code>node</code> pointer as a <code>*mut u8</code>.</p> <p>If the list head is <code>None</code>, it indicates that the list of blocks is empty. This means that we need to construct a new block as <a href="https://os.phil-opp.com/allocator-designs/#creating-new-blocks">described above</a>. For that, we first get the current block size from the <code>BLOCK_SIZES</code> slice and use it as both the size and the alignment for the new block. Then we create a new <code>Layout</code> from it and call the <code>fallback_alloc</code> method to perform the allocation. The reason for adjusting the layout and alignment is that the block will be added to the block list on deallocation.</p> <h4 id="dealloc"><a class="zola-anchor" href="#dealloc" aria-label="Anchor link for: dealloc">🔗</a><code>dealloc</code></h4> <p>The implementation of the <code>dealloc</code> method looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator/fixed_size_block.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::{mem, ptr::NonNull}; </span><span> </span><span style="color:#608b4e;">// inside the `unsafe impl GlobalAlloc` block </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout) { </span><span> </span><span style="color:#569cd6;">let mut</span><span> allocator = self.lock(); </span><span> </span><span style="color:#569cd6;">match </span><span>list_index(</span><span style="color:#569cd6;">&amp;</span><span>layout) { </span><span> Some(index) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> new_node = ListNode { </span><span> next: allocator.list_heads[index].take(), </span><span> }; </span><span> </span><span style="color:#608b4e;">// verify that block has size and alignment required for storing node </span><span> assert!(mem::size_of::&lt;ListNode&gt;() &lt;= </span><span style="color:#b4cea8;">BLOCK_SIZES</span><span>[index]); </span><span> assert!(mem::align_of::&lt;ListNode&gt;() &lt;= </span><span style="color:#b4cea8;">BLOCK_SIZES</span><span>[index]); </span><span> </span><span style="color:#569cd6;">let</span><span> new_node_ptr = ptr </span><span style="color:#569cd6;">as *mut</span><span> ListNode; </span><span> new_node_ptr.write(new_node); </span><span> allocator.list_heads[index] = Some(</span><span style="color:#569cd6;">&amp;mut </span><span>*new_node_ptr); </span><span> } </span><span> None </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> ptr = NonNull::new(ptr).unwrap(); </span><span> allocator.fallback_allocator.deallocate(ptr, layout); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>Like in <code>alloc</code>, we first use the <code>lock</code> method to get a mutable allocator reference and then the <code>list_index</code> function to get the block list corresponding to the given <code>Layout</code>. If the index is <code>None</code>, no fitting block size exists in <code>BLOCK_SIZES</code>, which indicates that the allocation was created by the fallback allocator. Therefore, we use its <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.deallocate"><code>deallocate</code></a> to free the memory again. The method expects a <a href="https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html"><code>NonNull</code></a> instead of a <code>*mut u8</code>, so we need to convert the pointer first. (The <code>unwrap</code> call only fails when the pointer is null, which should never happen when the compiler calls <code>dealloc</code>.)</p> <p>If <code>list_index</code> returns a block index, we need to add the freed memory block to the list. For that, we first create a new <code>ListNode</code> that points to the current list head (by using <a href="https://doc.rust-lang.org/core/option/enum.Option.html#method.take"><code>Option::take</code></a> again). Before we write the new node into the freed memory block, we first assert that the current block size specified by <code>index</code> has the required size and alignment for storing a <code>ListNode</code>. Then we perform the write by converting the given <code>*mut u8</code> pointer to a <code>*mut ListNode</code> pointer and then calling the unsafe <a href="https://doc.rust-lang.org/std/primitive.pointer.html#method.write"><code>write</code></a> method on it. The last step is to set the head pointer of the list, which is currently <code>None</code> since we called <code>take</code> on it, to our newly written <code>ListNode</code>. For that, we convert the raw <code>new_node_ptr</code> to a mutable reference.</p> <p>There are a few things worth noting:</p> <ul> <li>We don’t differentiate between blocks allocated from a block list and blocks allocated from the fallback allocator. This means that new blocks created in <code>alloc</code> are added to the block list on <code>dealloc</code>, thereby increasing the number of blocks of that size.</li> <li>The <code>alloc</code> method is the only place where new blocks are created in our implementation. This means that we initially start with empty block lists and only fill these lists lazily when allocations of their block size are performed.</li> <li>We don’t need <code>unsafe</code> blocks in <code>alloc</code> and <code>dealloc</code>, even though we perform some <code>unsafe</code> operations. The reason is that Rust currently treats the complete body of unsafe functions as one large <code>unsafe</code> block. Since using explicit <code>unsafe</code> blocks has the advantage that it’s obvious which operations are unsafe and which are not, there is a <a href="https://github.com/rust-lang/rfcs/pull/2585">proposed RFC</a> to change this behavior.</li> </ul> <h3 id="using-it-2"><a class="zola-anchor" href="#using-it-2" aria-label="Anchor link for: using-it-2">🔗</a>Using it</h3> <p>To use our new <code>FixedSizeBlockAllocator</code>, we need to update the <code>ALLOCATOR</code> static in the <code>allocator</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>fixed_size_block::FixedSizeBlockAllocator; </span><span> </span><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">ALLOCATOR</span><span>: Locked&lt;FixedSizeBlockAllocator&gt; = Locked::new( </span><span> FixedSizeBlockAllocator::new()); </span></code></pre> <p>Since the <code>init</code> function behaves the same for all allocators we implemented, we don’t need to modify the <code>init</code> call in <code>init_heap</code>.</p> <p>When we now run our <code>heap_allocation</code> tests again, all tests should still pass:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test --test heap_allocation </span><span>simple_allocation... [ok] </span><span>large_vec... [ok] </span><span>many_boxes... [ok] </span><span>many_boxes_long_lived... [ok] </span></code></pre> <p>Our new allocator seems to work!</p> <h3 id="discussion-2"><a class="zola-anchor" href="#discussion-2" aria-label="Anchor link for: discussion-2">🔗</a>Discussion</h3> <p>While the fixed-size block approach has much better performance than the linked list approach, it wastes up to half of the memory when using powers of 2 as block sizes. Whether this tradeoff is worth it heavily depends on the application type. For an operating system kernel, where performance is critical, the fixed-size block approach seems to be the better choice.</p> <p>On the implementation side, there are various things that we could improve in our current implementation:</p> <ul> <li>Instead of only allocating blocks lazily using the fallback allocator, it might be better to pre-fill the lists to improve the performance of initial allocations.</li> <li>To simplify the implementation, we only allowed block sizes that are powers of 2 so that we could also use them as the block alignment. By storing (or calculating) the alignment in a different way, we could also allow arbitrary other block sizes. This way, we could add more block sizes, e.g., for common allocation sizes, in order to minimize the wasted memory.</li> <li>We currently only create new blocks, but never free them again. This results in fragmentation and might eventually result in allocation failure for large allocations. It might make sense to enforce a maximum list length for each block size. When the maximum length is reached, subsequent deallocations are freed using the fallback allocator instead of being added to the list.</li> <li>Instead of falling back to a linked list allocator, we could have a special allocator for allocations greater than 4 KiB. The idea is to utilize <a href="https://os.phil-opp.com/paging-introduction/">paging</a>, which operates on 4 KiB pages, to map a continuous block of virtual memory to non-continuous physical frames. This way, fragmentation of unused memory is no longer a problem for large allocations.</li> <li>With such a page allocator, it might make sense to add block sizes up to 4 KiB and drop the linked list allocator completely. The main advantages of this would be reduced fragmentation and improved performance predictability, i.e., better worst-case performance.</li> </ul> <p>It’s important to note that the implementation improvements outlined above are only suggestions. Allocators used in operating system kernels are typically highly optimized for the specific workload of the kernel, which is only possible through extensive profiling.</p> <h3 id="variations"><a class="zola-anchor" href="#variations" aria-label="Anchor link for: variations">🔗</a>Variations</h3> <p>There are also many variations of the fixed-size block allocator design. Two popular examples are the <em>slab allocator</em> and the <em>buddy allocator</em>, which are also used in popular kernels such as Linux. In the following, we give a short introduction to these two designs.</p> <h4 id="slab-allocator"><a class="zola-anchor" href="#slab-allocator" aria-label="Anchor link for: slab-allocator">🔗</a>Slab Allocator</h4> <p>The idea behind a <a href="https://en.wikipedia.org/wiki/Slab_allocation">slab allocator</a> is to use block sizes that directly correspond to selected types in the kernel. This way, allocations of those types fit a block size exactly and no memory is wasted. Sometimes, it might be even possible to preinitialize type instances in unused blocks to further improve performance.</p> <p>Slab allocation is often combined with other allocators. For example, it can be used together with a fixed-size block allocator to further split an allocated block in order to reduce memory waste. It is also often used to implement an <a href="https://en.wikipedia.org/wiki/Object_pool_pattern">object pool pattern</a> on top of a single large allocation.</p> <h4 id="buddy-allocator"><a class="zola-anchor" href="#buddy-allocator" aria-label="Anchor link for: buddy-allocator">🔗</a>Buddy Allocator</h4> <p>Instead of using a linked list to manage freed blocks, the <a href="https://en.wikipedia.org/wiki/Buddy_memory_allocation">buddy allocator</a> design uses a <a href="https://en.wikipedia.org/wiki/Binary_tree">binary tree</a> data structure together with power-of-2 block sizes. When a new block of a certain size is required, it splits a larger sized block into two halves, thereby creating two child nodes in the tree. Whenever a block is freed again, its neighbor block in the tree is analyzed. If the neighbor is also free, the two blocks are joined back together to form a block of twice the size.</p> <p>The advantage of this merge process is that <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation">external fragmentation</a> is reduced so that small freed blocks can be reused for a large allocation. It also does not use a fallback allocator, so the performance is more predictable. The biggest drawback is that only power-of-2 block sizes are possible, which might result in a large amount of wasted memory due to <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation">internal fragmentation</a>. For this reason, buddy allocators are often combined with a slab allocator to further split an allocated block into multiple smaller blocks.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>This post gave an overview of different allocator designs. We learned how to implement a basic <a href="https://os.phil-opp.com/allocator-designs/#bump-allocator">bump allocator</a>, which hands out memory linearly by increasing a single <code>next</code> pointer. While bump allocation is very fast, it can only reuse memory after all allocations have been freed. For this reason, it is rarely used as a global allocator.</p> <p>Next, we created a <a href="https://os.phil-opp.com/allocator-designs/#linked-list-allocator">linked list allocator</a> that uses the freed memory blocks itself to create a linked list, the so-called <a href="https://en.wikipedia.org/wiki/Free_list">free list</a>. This list makes it possible to store an arbitrary number of freed blocks of different sizes. While no memory waste occurs, the approach suffers from poor performance because an allocation request might require a complete traversal of the list. Our implementation also suffers from <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation">external fragmentation</a> because it does not merge adjacent freed blocks back together.</p> <p>To fix the performance problems of the linked list approach, we created a <a href="https://os.phil-opp.com/allocator-designs/#fixed-size-block-allocator">fixed-size block allocator</a> that predefines a fixed set of block sizes. For each block size, a separate <a href="https://en.wikipedia.org/wiki/Free_list">free list</a> exists so that allocations and deallocations only need to insert/pop at the front of the list and are thus very fast. Since each allocation is rounded up to the next larger block size, some memory is wasted due to <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation">internal fragmentation</a>.</p> <p>There are many more allocator designs with different tradeoffs. <a href="https://os.phil-opp.com/allocator-designs/#slab-allocator">Slab allocation</a> works well to optimize the allocation of common fixed-size structures, but is not applicable in all situations. <a href="https://os.phil-opp.com/allocator-designs/#buddy-allocator">Buddy allocation</a> uses a binary tree to merge freed blocks back together, but wastes a large amount of memory because it only supports power-of-2 block sizes. It’s also important to remember that each kernel implementation has a unique workload, so there is no “best” allocator design that fits all cases.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>With this post, we conclude our memory management implementation for now. Next, we will start exploring <a href="https://en.wikipedia.org/wiki/Computer_multitasking"><em>multitasking</em></a>, starting with cooperative multitasking in the form of <a href="https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html"><em>async/await</em></a>. In subsequent posts, we will then explore <a href="https://en.wikipedia.org/wiki/Thread_(computing)"><em>threads</em></a>, <a href="https://en.wikipedia.org/wiki/Multiprocessing"><em>multiprocessing</em></a>, and <a href="https://en.wikipedia.org/wiki/Process_(computing)"><em>processes</em></a>.</p> Updates in December 2019 Tue, 07 Jan 2020 00:00:00 +0000 https://os.phil-opp.com/status-update/2020-01-07/ https://os.phil-opp.com/status-update/2020-01-07/ <p>Happy New Year!</p> <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the corresponding libraries and tools.</p> <h2 id="blog-os"><code>blog_os</code></h2> <p>The repository of the <em>Writing an OS in Rust</em> blog received the following updates:</p> <ul> <li>Update <code>x86_64</code> dependency to version 0.8.1. This included the <a href="https://github.com/phil-opp/blog_os/pull/701">dependency update</a> itself, an <a href="https://github.com/phil-opp/blog_os/pull/703">update of the frame allocation code</a>, and an <a href="https://github.com/phil-opp/blog_os/pull/704">update of the blog</a>.</li> <li><a href="https://github.com/phil-opp/blog_os/pull/705">License the <code>blog/content</code> folder under CC BY-NC</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/709">Reword sentence in first post</a> by <a href="https://github.com/pamolloy">@pamolloy</a></li> </ul> <p>Further, we’re still working on adding <a href="https://github.com/phil-opp/blog_os/pull/692">Experimental Support for Community Translations</a> to the blog, starting with <a href="https://github.com/phil-opp/blog_os/pull/694">Simplified Chinese</a> and <a href="https://github.com/phil-opp/blog_os/pull/699">Traditional Chinese</a>. Any help is appreciated!</p> <h2 id="bootloader"><code>bootloader</code></h2> <p>There were no updates to the bootloader this month.</p> <p>I’m currently working on rewriting the 16-bit/32-bit stages in Rust and making the bootloader more modular in the process. This should make it much easier to add support for UEFI and GRUB booting later.</p> <h2 id="bootimage"><code>bootimage</code></h2> <p>There were no updates to the <code>bootimage</code> tool this month.</p> <h2 id="x86-64"><code>x86_64</code></h2> <p>We landed a number of breaking changes this month:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/91">Replace <code>ux</code> dependency with custom wrapper structs</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/89">Add new UnusedPhysFrame type and use it in Mapper::map_to</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/84">Make Mapper trait object safe by adding <code>Self: Sized</code> bounds on generic functions</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/108">Rename divide_by_zero field of IDT to divide_error</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/109">Introduce new diverging handler functions for exceptions classified as “abort”</a></li> </ul> <p>These changes were released an version 0.8.0. Unfortunately, there was a missing re-export for the new <code>UnusedPhysFrame</code> type. We fixed it in <a href="https://github.com/rust-osdev/x86_64/pull/110">#110</a> and released the fix as version 0.8.1.</p> <p>There was one more addition to the <code>x86_64</code> crate afterwards:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/111">Add support for cr4 control register (with complete documentation)</a> by <a href="https://github.com/KarimAllah">@KarimAllah</a> (released as version 0.8.2).</li> </ul> <p>There were also a few changes related to continuous integration:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/103">Remove bors from this repo</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/104">Run ‘push’ builds only for master branch</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/105">Remove Travis CI and Azure Pipelines scripts</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/100">Add caching of cargo crates to GitHub Actions CI</a></li> </ul> <h2 id="cargo-xbuild"><code>cargo-xbuild</code></h2> <p>The <code>cargo-xbuild</code> crate, which cross-compiles the sysroot, received the following updates this month:</p> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/43">Add <code>--quiet</code> flag that suppresses “waiting for file lock” message</a> by <a href="https://github.com/Nils-TUD">@Nils-TUD</a> (published as version 0.5.19)</li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/50">Fix wrong feature name for memcpy=false</a> (released as version 0.5.20)</li> </ul> Updates in October and November 2019 Mon, 02 Dec 2019 00:00:00 +0000 https://os.phil-opp.com/status-update/2019-12-02/ https://os.phil-opp.com/status-update/2019-12-02/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the used libraries and tools.</p> <p>I moved to a new apartment mid-October and had lots of work to do there, so I didn’t have the time for creating the October status update post. Therefore, this post lists the changes from both October and November. I’m slowly picking up speed again, but I still have a lot of mails in my backlog. Sorry if you haven’t received an answer yet!</p> <h2 id="blog-os"><code>blog_os</code></h2> <p>The blog itself received only a minor update: <a href="https://github.com/phil-opp/blog_os/pull/687">Use panic! instead of println! + loop in double fault handler</a>. This fixes an issue where a double fault during <code>cargo xtest</code> leads to an endless loop without any output on the serial port.</p> <p>We also have other news: We plan to add <a href="https://github.com/phil-opp/blog_os/pull/692">Experimental Support for Community Translations</a> to the blog. While this imposes additional challenges, it makes the content accessible to people who don’t speak English, so it’s definitely worth trying in my opinion. The first additional language will be <a href="https://github.com/phil-opp/blog_os/pull/694">Chinese</a>, based on an <a href="https://github.com/rustcc/writing-an-os-in-rust">existing translation</a> by <a href="https://github.com/luojia65">@luojia65</a>. Many thanks also to <a href="https://github.com/TheBegining">@TheBegining</a> and <a href="https://github.com/Rustin-Liu">@Rustin-Liu</a> for helping with the translation!</p> <h2 id="bootloader"><code>bootloader</code></h2> <ul> <li><a href="https://github.com/rust-osdev/bootloader/pull/81">Change the way the kernel entry point is called to honor alignment ABI</a> by <a href="https://github.com/GuillaumeDIDIER">@GuillaumeDIDIER</a> (published as version 0.8.2)</li> <li><a href="https://github.com/rust-osdev/bootloader/pull/82">Add support for Github Actions</a></li> <li><a href="https://github.com/rust-osdev/bootloader/pull/85">Remove unnecessary <code>extern C</code> on panic handler to fix not-ffi-safe warning</a> by <a href="https://github.com/cmsd2">@cmsd2</a> (published as version 0.8.3)</li> </ul> <h2 id="bootimage"><code>bootimage</code></h2> <ul> <li><a href="https://github.com/rust-osdev/bootimage/pull/47">Don’t exit with expected exit code when failed to read QEMU exit code</a></li> </ul> <h2 id="x86-64"><code>x86_64</code></h2> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/93">Switch to GitHub Actions for CI</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/94">Use <code>repr C</code> to suppress not-ffi-safe when used with extern handler functions</a> by <a href="https://github.com/cmsd2">@cmsd2</a> (published as version 0.7.6)</li> <li><a href="https://github.com/rust-osdev/x86_64/pull/95">Add <code>slice</code> and <code>slice_mut</code> methods to IDT</a> by <a href="https://github.com/foxcob">@foxcob</a> (published as version 0.7.7)</li> </ul> <h2 id="cargo-xbuild"><code>cargo-xbuild</code></h2> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/47">Add support for publishing and installing cross compiled crates</a> by <a href="https://github.com/ALSchwalm">@ALSchwalm</a> (published as version 0.5.18)</li> </ul> Updates in September 2019 Sun, 06 Oct 2019 00:00:00 +0000 https://os.phil-opp.com/status-update/2019-10-06/ https://os.phil-opp.com/status-update/2019-10-06/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the used libraries and tools.</p> <p>I finished my master thesis and got my degree this month, so I only had limited time for my open source work. I still managed to perform a few minor updates, including code simplications for the <em>Paging Implementation</em> post and the evaluation of GitHub Actions as a CI service.</p> <h2 id="blog-os"><code>blog_os</code></h2> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/666">Improve Paging Implementation Post</a>: Improves and simplifies the code in multiple places</li> <li><a href="https://github.com/phil-opp/blog_os/pull/660">Use GitHub Actions to build and deploy blog</a></li> <li>Set up GitHub Actions for <code>post-XX</code> branches: <a href="https://github.com/phil-opp/blog_os/pull/661"><code>post-01</code></a>, <a href="https://github.com/phil-opp/blog_os/pull/662"><code>post-02</code></a>, <a href="https://github.com/phil-opp/blog_os/pull/663"><code>post-04</code></a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/664">Update to bootloader 0.8.0</a>: Considerably reduces compile times</li> <li><a href="https://github.com/phil-opp/blog_os/pull/670">Update to Zola 0.9.0</a>: Updates the used static site generator to the latest version</li> </ul> <h2 id="cargo-xbuild"><code>cargo-xbuild</code></h2> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/44">Print a warning when building for the host target</a></li> </ul> <h2 id="bootloader"><code>bootloader</code></h2> <ul> <li><a href="https://github.com/rust-osdev/bootloader/pull/77">Add a Cargo Feature for Enabling SSE</a></li> </ul> <h2 id="uart-16550"><code>uart_16550</code></h2> <ul> <li><a href="https://github.com/rust-osdev/uart_16550/pull/1">Update to x86_64 0.7.3 and bitflags</a></li> <li><a href="https://github.com/rust-osdev/uart_16550/pull/2">Document how serial port is configured by default</a> by <a href="https://github.com/edigaryev">@edigaryev</a></li> </ul> <h2 id="x86-64"><code>x86_64</code></h2> <p>No updates were merged in September. However, I’m planning some breaking changes for the crate, namely:</p> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/91">Replace <code>ux</code> dependency with custom wrapper structs</a> to reduce compile times</li> <li><a href="https://github.com/rust-osdev/x86_64/pull/89">Add new UnsafePhysFrame type and use it in Mapper::map_to</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/84">Make Mapper trait object safe by adding <code>Self: Sized</code> bounds on generic functions</a></li> </ul> <!-- ## `bootimage` No updates were merged in September. --> Updates in August 2019 Mon, 09 Sep 2019 00:00:00 +0000 https://os.phil-opp.com/status-update/2019-09-09/ https://os.phil-opp.com/status-update/2019-09-09/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the used libraries and tools.</p> <p>I was very busy with finishing my master’s thesis, so I didn’t have any to implement any notable changes myself. Thanks to contributions by <a href="https://github.com/vinaychandra">@vinaychandra</a> and <a href="https://github.com/64">@64</a>, we were still able to publish new versions of the <code>x86_64</code>, <code>bootimage</code> and <code>bootloader</code> crates.</p> <h2 id="blog-os"><code>blog_os</code></h2> <p>Apart from <a href="https://github.com/phil-opp/blog_os/pull/650">rewriting the section about no-harness tests</a> of the <em>Testing</em> post, there were no notable changes to the blog in August. Now that I have some more free time again, I plan to upgrade the blog to the latest versions of <code>bootloader</code> and <code>bootimage</code>, evaluate the use of <a href="https://github.com/features/actions">GitHub Actions</a> for the repository, and continue the work on the upcoming post about heap allocator implementations.</p> <h2 id="x86-64"><code>x86_64</code></h2> <p>Thanks to <a href="https://github.com/vinaychandra">@vinaychandra</a>, the <code>x86_64</code> crate now has <a href="https://github.com/rust-osdev/x86_64/pull/87">support for the <code>FsBase</code> and <code>GsBase</code> registers</a>. The change was published as version 0.7.5.</p> <h2 id="bootimage"><code>bootimage</code></h2> <p>To allow bootloaders to read configuration from the <code>Cargo.toml</code> file of the kernel, the <code>bootimage</code> crate now <a href="https://github.com/rust-osdev/bootimage/pull/45">passes the location of the kernel’s Cargo.toml to bootloader crates</a>. This change was implemented by <a href="https://github.com/64">@64</a> and published as version 0.7.7.</p> <h2 id="bootloader"><code>bootloader</code></h2> <p>Apart from initializing the CPU and loading the kernel, the <code>bootloader</code> crate is also responsible for creating several memory regions for the kernel, for example a program stack and the boot information struct. These regions must be mapped at some address in the virtual address space.</p> <p>As a stop-gap solution, the <code>bootloader</code> crate used fixed virtual addresses for these regions, which resulted in errors if the kernel tried to use the same address ranges itself. For example, the (optional) recursive mapping of page tables often conflicted with so-called <em>higher half kernels</em>, which live at the upper end of the address space. To avoid these conflicts, <a href="https://github.com/64">@64</a> updated the <code>bootloader</code> crate to <a href="https://github.com/rust-osdev/bootloader/pull/71">dynamically map the kernel stack, boot info, physical memory, and recursive table regions</a> at an unused virtual address range.</p> <p>To also support specifying explicit addresses for these regions, <a href="https://github.com/64">@64</a> further added support for <a href="https://github.com/rust-osdev/bootloader/pull/73">parsing bootloader configuration from the kernel’s Cargo.toml</a>. This way, the virtual addresses of the kernel stack and physical memory mapping can now be configured using a <code>package.metadata.bootloader</code> key in the <code>Cargo.toml</code> of the kernel. In a third pull request, <a href="https://github.com/64">@64</a> also made the <a href="https://github.com/rust-osdev/bootloader/pull/72">kernel stack size configurable</a>.</p> <p>The changes were published together as version 0.8.0. This is a breaking update because the new configuration system requires at least version 0.7.7 of <code>bootimage</code>, which is the first version that passes the location of the kernel’s <code>Cargo.toml</code> file.</p> Updates in July 2019 Fri, 02 Aug 2019 00:00:00 +0000 https://os.phil-opp.com/status-update/2019-08-02/ https://os.phil-opp.com/status-update/2019-08-02/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the used libraries and tools.</p> <p>Since I’m still very busy with my master thesis, I haven’t had the time to work on a new post. But there were quite a few maintenance updates this month and also a few new features such as the new <code>OffsetPageTable</code> type in the <code>x86_64</code> crate.</p> <p>We also had some great contributions this month. Thanks to the efforts of <a href="https://github.com/64">@64</a>, we were able to considerably lower the compile times of the <code>x86_64</code> and <code>bootloader</code> crates. Thanks to <a href="https://github.com/Aehmlo">@Aehmlo</a>, the <code>cargo-xbuild</code> crate now has a <code>cargo xdoc</code> subcommands and support for the <code>cargo {c, b, t, r}</code> aliases.</p> <p>The following list gives a short overview of notable changes to the different projects.</p> <h2 id="blog-os">blog_os</h2> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/638">Fix a lot of dead links in both the second and first edition</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/644">Update paging introduction post to use page fault error code</a></li> </ul> <h2 id="x86-64">x86_64</h2> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/82">Reexport MappedPageTable on non-x86_64 platforms too</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/78">Update GDT docs, add user_data_segment function and WRITABLE flag</a> by <a href="https://github.com/64">@64</a> (published as version 0.7.2)</li> <li><a href="https://github.com/rust-osdev/x86_64/pull/83">Add a new <code>OffsetPageTable</code> mapper type</a> (published as version 0.7.3)</li> <li><a href="https://github.com/rust-osdev/x86_64/pull/86">Update integration tests to use new testing framework</a></li> <li><a href="https://github.com/rust-osdev/x86_64/pull/85">Remove raw-cpuid dependency and use rdrand intrinsics</a> by <a href="https://github.com/64">@64</a> (published as version 0.7.4)</li> </ul> <h2 id="bootloader">bootloader</h2> <ul> <li><a href="https://github.com/rust-osdev/bootloader/pull/62">Remove stabilized publish-lockfile feature</a> (published as version 0.6.2)</li> <li><a href="https://github.com/rust-osdev/bootloader/pull/63">Update CI badge, use latest version of x86_64 crate and rustfmt</a> by <a href="https://github.com/64">@64</a> (published as version 0.6.3)</li> <li><a href="https://github.com/rust-osdev/bootloader/pull/67">Use volatile accesses in VGA code and make font dependency optional</a> by <a href="https://github.com/64">@64</a> <ul> <li>Making the dependency optional should improve compile times when the VGA text mode is used</li> <li>Published as version 0.6.4</li> </ul> </li> <li><strong>Breaking</strong>: <a href="https://github.com/rust-osdev/bootloader/pull/68">Only include dependencies when <code>binary</code> feature is enabled</a> (published as version 0.7.0)</li> </ul> <h2 id="bootimage">bootimage</h2> <ul> <li><a href="https://github.com/rust-osdev/bootimage/pull/43">If the bootloader has a feature named <code>binary</code>, enable it</a> (published as version 0.7.6) <ul> <li>This is required for building <code>bootloader 0.7.0</code> or later</li> </ul> </li> </ul> <h2 id="cargo-xbuild">cargo-xbuild</h2> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/39">Add <code>cargo xdoc</code> command for invoking <code>cargo doc</code></a> by <a href="https://github.com/Aehmlo">@Aehmlo</a> (published as version 0.5.13)</li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/40">Don’t append a <code>--sysroot</code> argument to <code>RUSTFLAGS</code> if it already contains one</a> (published as version 0.5.14)</li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/42">Add xb, xt, xc, and xr subcommands</a> by <a href="https://github.com/Aehmlo">@Aehmlo</a> (published as version 0.5.15)</li> </ul> Updates in June 2019 Sat, 06 Jul 2019 00:00:00 +0000 https://os.phil-opp.com/status-update/2019-07-06/ https://os.phil-opp.com/status-update/2019-07-06/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and the used libraries and tools.</p> <p>My focus this month was to finish the <a href="https://os.phil-opp.com/heap-allocation/"><em>Heap Allocation</em></a> post, on which I had been working since March. I originally wanted to include a section about different allocator designs (bump, linked list, slab, …) and how to implement them, but I decided to split it out into a separate post because it became much too long. I try to release this half-done post soon.</p> <p>Apart from the new post, there were some minor updates to the <code>x86_64</code>, <code>bootloader</code> and <code>cargo-xbuild</code> crates. The following gives a short overview of notable changes to the different projects.</p> <h2 id="blog-os">blog_os</h2> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/617">Use misspell tool to look for common typos</a></li> <li><a href="https://github.com/phil-opp/blog_os/pull/625">New post about heap allocation</a></li> </ul> <h2 id="x86-64">x86_64</h2> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/77">Add ring-3 flag to GDT descriptor</a> by <a href="https://github.com/mark-i-m">@mark-i-m</a> (released as version 0.7.1)</li> <li><a href="https://github.com/rust-osdev/x86_64/pull/79">Add bochs magic breakpoint, read instruction pointer, inline instructions</a> by <a href="https://github.com/64">@64</a></li> </ul> <h2 id="bootloader">bootloader</h2> <ul> <li><a href="https://github.com/rust-osdev/bootloader/pull/58">Make the physical memory offset configurable through a <code>BOOTLOADER_PHYSICAL_MEMORY_OFFSET</code> environment variable</a></li> <li><a href="https://github.com/rust-osdev/bootloader/pull/59">Use a stripped copy of the kernel binary (debug info removed) to reduce load times</a> (released as version 0.6.1)</li> </ul> <!-- ## Bootimage --> <h2 id="cargo-xbuild">cargo-xbuild</h2> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/commit/994b5e75e1a4062cf506700e0ff38d5404338a37">Document the XBUILD_SYSROOT_PATH environment variable</a></li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/commit/a1ff03311dd74447e8e845b4b96f2e137850027d">Fix incorrect joining of paths that caused some problems on Windows</a></li> </ul> Heap Allocation Wed, 26 Jun 2019 00:00:00 +0000 https://os.phil-opp.com/heap-allocation/ https://os.phil-opp.com/heap-allocation/ <p>This post adds support for heap allocation to our kernel. First, it gives an introduction to dynamic memory and shows how the borrow checker prevents common allocation errors. It then implements the basic allocation interface of Rust, creates a heap memory region, and sets up an allocator crate. At the end of this post, all the allocation and collection types of the built-in <code>alloc</code> crate will be available to our kernel.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/heap-allocation/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-10"><code>post-10</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="local-and-static-variables"><a class="zola-anchor" href="#local-and-static-variables" aria-label="Anchor link for: local-and-static-variables">🔗</a>Local and Static Variables</h2> <p>We currently use two types of variables in our kernel: local variables and <code>static</code> variables. Local variables are stored on the <a href="https://en.wikipedia.org/wiki/Call_stack">call stack</a> and are only valid until the surrounding function returns. Static variables are stored at a fixed memory location and always live for the complete lifetime of the program.</p> <h3 id="local-variables"><a class="zola-anchor" href="#local-variables" aria-label="Anchor link for: local-variables">🔗</a>Local Variables</h3> <p>Local variables are stored on the <a href="https://en.wikipedia.org/wiki/Call_stack">call stack</a>, which is a <a href="https://en.wikipedia.org/wiki/Stack_(abstract_data_type)">stack data structure</a> that supports <code>push</code> and <code>pop</code> operations. On each function entry, the parameters, the return address, and the local variables of the called function are pushed by the compiler:</p> <p><img src="https://os.phil-opp.com/heap-allocation/call-stack.svg" alt="An outer() and an inner(i: usize) function, where outer calls inner(1). Both have some local variables. The call stack contains the following slots: the local variables of outer, then the argument i = 1, then the return address, then the local variables of inner." /></p> <p>The above example shows the call stack after the <code>outer</code> function called the <code>inner</code> function. We see that the call stack contains the local variables of <code>outer</code> first. On the <code>inner</code> call, the parameter <code>1</code> and the return address for the function were pushed. Then control was transferred to <code>inner</code>, which pushed its local variables.</p> <p>After the <code>inner</code> function returns, its part of the call stack is popped again and only the local variables of <code>outer</code> remain:</p> <p><img src="https://os.phil-opp.com/heap-allocation/call-stack-return.svg" alt="The call stack containing only the local variables of outer" /></p> <p>We see that the local variables of <code>inner</code> only live until the function returns. The Rust compiler enforces these lifetimes and throws an error when we use a value for too long, for example when we try to return a reference to a local variable:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>inner(i: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">&amp;&#39;static u32 </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> z = [</span><span style="color:#b5cea8;">1</span><span>, </span><span style="color:#b5cea8;">2</span><span>, </span><span style="color:#b5cea8;">3</span><span>]; </span><span> </span><span style="color:#569cd6;">&amp;</span><span>z[i] </span><span>} </span></code></pre> <p>(<a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=6186a0f3a54f468e1de8894996d12819">run the example on the playground</a>)</p> <p>While returning a reference makes no sense in this example, there are cases where we want a variable to live longer than the function. We already saw such a case in our kernel when we tried to <a href="https://os.phil-opp.com/cpu-exceptions/#loading-the-idt">load an interrupt descriptor table</a> and had to use a <code>static</code> variable to extend the lifetime.</p> <h3 id="static-variables"><a class="zola-anchor" href="#static-variables" aria-label="Anchor link for: static-variables">🔗</a>Static Variables</h3> <p>Static variables are stored at a fixed memory location separate from the stack. This memory location is assigned at compile time by the linker and encoded in the executable. Statics live for the complete runtime of the program, so they have the <code>'static</code> lifetime and can always be referenced from local variables:</p> <p><img src="https://os.phil-opp.com/heap-allocation/call-stack-static.svg" alt="The same outer/inner example, except that inner has a static Z: [u32; 3] = [1,2,3]; and returns a &amp;Z[i] reference" /></p> <p>When the <code>inner</code> function returns in the above example, its part of the call stack is destroyed. The static variables live in a separate memory range that is never destroyed, so the <code>&amp;Z[1]</code> reference is still valid after the return.</p> <p>Apart from the <code>'static</code> lifetime, static variables also have the useful property that their location is known at compile time, so that no reference is needed for accessing them. We utilized that property for our <code>println</code> macro: By using a <a href="https://os.phil-opp.com/vga-text-mode/#a-global-interface">static <code>Writer</code></a> internally, there is no <code>&amp;mut Writer</code> reference needed to invoke the macro, which is very useful in <a href="https://os.phil-opp.com/cpu-exceptions/#implementation">exception handlers</a>, where we don’t have access to any additional variables.</p> <p>However, this property of static variables brings a crucial drawback: they are read-only by default. Rust enforces this because a <a href="https://doc.rust-lang.org/nomicon/races.html">data race</a> would occur if, e.g., two threads modified a static variable at the same time. The only way to modify a static variable is to encapsulate it in a <a href="https://docs.rs/spin/0.5.2/spin/struct.Mutex.html"><code>Mutex</code></a> type, which ensures that only a single <code>&amp;mut</code> reference exists at any point in time. We already used a <code>Mutex</code> for our <a href="https://os.phil-opp.com/vga-text-mode/#spinlocks">static VGA buffer <code>Writer</code></a>.</p> <h2 id="dynamic-memory"><a class="zola-anchor" href="#dynamic-memory" aria-label="Anchor link for: dynamic-memory">🔗</a>Dynamic Memory</h2> <p>Local and static variables are already very powerful together and enable most use cases. However, we saw that they both have their limitations:</p> <ul> <li>Local variables only live until the end of the surrounding function or block. This is because they live on the call stack and are destroyed after the surrounding function returns.</li> <li>Static variables always live for the complete runtime of the program, so there is no way to reclaim and reuse their memory when they’re no longer needed. Also, they have unclear ownership semantics and are accessible from all functions, so they need to be protected by a <a href="https://docs.rs/spin/0.5.2/spin/struct.Mutex.html"><code>Mutex</code></a> when we want to modify them.</li> </ul> <p>Another limitation of local and static variables is that they have a fixed size. So they can’t store a collection that dynamically grows when more elements are added. (There are proposals for <a href="https://github.com/rust-lang/rust/issues/48055">unsized rvalues</a> in Rust that would allow dynamically sized local variables, but they only work in some specific cases.)</p> <p>To circumvent these drawbacks, programming languages often support a third memory region for storing variables called the <strong>heap</strong>. The heap supports <em>dynamic memory allocation</em> at runtime through two functions called <code>allocate</code> and <code>deallocate</code>. It works in the following way: The <code>allocate</code> function returns a free chunk of memory of the specified size that can be used to store a variable. This variable then lives until it is freed by calling the <code>deallocate</code> function with a reference to the variable.</p> <p>Let’s go through an example:</p> <p><img src="https://os.phil-opp.com/heap-allocation/call-stack-heap.svg" alt="The inner function calls allocate(size_of([u32; 3])), writes z.write([1,2,3]);, and returns (z as *mut u32).offset(i). On the returned value y, the outer function performs deallocate(y, size_of(u32))." /></p> <p>Here the <code>inner</code> function uses heap memory instead of static variables for storing <code>z</code>. It first allocates a memory block of the required size, which returns a <code>*mut u32</code> <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#dereferencing-a-raw-pointer">raw pointer</a>. It then uses the <a href="https://doc.rust-lang.org/core/ptr/fn.write.html"><code>ptr::write</code></a> method to write the array <code>[1,2,3]</code> to it. In the last step, it uses the <a href="https://doc.rust-lang.org/std/primitive.pointer.html#method.offset"><code>offset</code></a> function to calculate a pointer to the <code>i</code>-th element and then returns it. (Note that we omitted some required casts and unsafe blocks in this example function for brevity.)</p> <p>The allocated memory lives until it is explicitly freed through a call to <code>deallocate</code>. Thus, the returned pointer is still valid even after <code>inner</code> returned and its part of the call stack was destroyed. The advantage of using heap memory compared to static memory is that the memory can be reused after it is freed, which we do through the <code>deallocate</code> call in <code>outer</code>. After that call, the situation looks like this:</p> <p><img src="https://os.phil-opp.com/heap-allocation/call-stack-heap-freed.svg" alt="The call stack contains the local variables of outer, the heap contains z[0] and z[2], but no longer z[1]." /></p> <p>We see that the <code>z[1]</code> slot is free again and can be reused for the next <code>allocate</code> call. However, we also see that <code>z[0]</code> and <code>z[2]</code> are never freed because we never deallocate them. Such a bug is called a <em>memory leak</em> and is often the cause of excessive memory consumption of programs (just imagine what happens when we call <code>inner</code> repeatedly in a loop). This might seem bad, but there are much more dangerous types of bugs that can happen with dynamic allocation.</p> <h3 id="common-errors"><a class="zola-anchor" href="#common-errors" aria-label="Anchor link for: common-errors">🔗</a>Common Errors</h3> <p>Apart from memory leaks, which are unfortunate but don’t make the program vulnerable to attackers, there are two common types of bugs with more severe consequences:</p> <ul> <li>When we accidentally continue to use a variable after calling <code>deallocate</code> on it, we have a so-called <strong>use-after-free</strong> vulnerability. Such a bug causes undefined behavior and can often be exploited by attackers to execute arbitrary code.</li> <li>When we accidentally free a variable twice, we have a <strong>double-free</strong> vulnerability. This is problematic because it might free a different allocation that was allocated in the same spot after the first <code>deallocate</code> call. Thus, it can lead to a use-after-free vulnerability again.</li> </ul> <p>These types of vulnerabilities are commonly known, so one might expect that people have learned how to avoid them by now. But no, such vulnerabilities are still regularly found, for example this <a href="https://securityboulevard.com/2019/02/linux-use-after-free-vulnerability-found-in-linux-2-6-through-4-20-11/">use-after-free vulnerability in Linux</a> (2019), that allowed arbitrary code execution. A web search like <code>use-after-free linux {current year}</code> will probably always yield results. This shows that even the best programmers are not always able to correctly handle dynamic memory in complex projects.</p> <p>To avoid these issues, many languages, such as Java or Python, manage dynamic memory automatically using a technique called <a href="https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)"><em>garbage collection</em></a>. The idea is that the programmer never invokes <code>deallocate</code> manually. Instead, the program is regularly paused and scanned for unused heap variables, which are then automatically deallocated. Thus, the above vulnerabilities can never occur. The drawbacks are the performance overhead of the regular scan and the probably long pause times.</p> <p>Rust takes a different approach to the problem: It uses a concept called <a href="https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html"><em>ownership</em></a> that is able to check the correctness of dynamic memory operations at compile time. Thus, no garbage collection is needed to avoid the mentioned vulnerabilities, which means that there is no performance overhead. Another advantage of this approach is that the programmer still has fine-grained control over the use of dynamic memory, just like with C or C++.</p> <h3 id="allocations-in-rust"><a class="zola-anchor" href="#allocations-in-rust" aria-label="Anchor link for: allocations-in-rust">🔗</a>Allocations in Rust</h3> <p>Instead of letting the programmer manually call <code>allocate</code> and <code>deallocate</code>, the Rust standard library provides abstraction types that call these functions implicitly. The most important type is <a href="https://doc.rust-lang.org/std/boxed/index.html"><strong><code>Box</code></strong></a>, which is an abstraction for a heap-allocated value. It provides a <a href="https://doc.rust-lang.org/alloc/boxed/struct.Box.html#method.new"><code>Box::new</code></a> constructor function that takes a value, calls <code>allocate</code> with the size of the value, and then moves the value to the newly allocated slot on the heap. To free the heap memory again, the <code>Box</code> type implements the <a href="https://doc.rust-lang.org/book/ch15-03-drop.html"><code>Drop</code> trait</a> to call <code>deallocate</code> when it goes out of scope:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> z = Box::new([</span><span style="color:#b5cea8;">1</span><span>,</span><span style="color:#b5cea8;">2</span><span>,</span><span style="color:#b5cea8;">3</span><span>]); </span><span> […] </span><span>} </span><span style="color:#608b4e;">// z goes out of scope and `deallocate` is called </span></code></pre> <p>This pattern has the strange name <a href="https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization"><em>resource acquisition is initialization</em></a> (or <em>RAII</em> for short). It originated in C++, where it is used to implement a similar abstraction type called <a href="https://en.cppreference.com/w/cpp/memory/unique_ptr"><code>std::unique_ptr</code></a>.</p> <p>Such a type alone does not suffice to prevent all use-after-free bugs since programmers can still hold on to references after the <code>Box</code> goes out of scope and the corresponding heap memory slot is deallocated:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> x = { </span><span> </span><span style="color:#569cd6;">let</span><span> z = Box::new([</span><span style="color:#b5cea8;">1</span><span>,</span><span style="color:#b5cea8;">2</span><span>,</span><span style="color:#b5cea8;">3</span><span>]); </span><span> </span><span style="color:#569cd6;">&amp;</span><span>z[</span><span style="color:#b5cea8;">1</span><span>] </span><span>}; </span><span style="color:#608b4e;">// z goes out of scope and `deallocate` is called </span><span>println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, x); </span></code></pre> <p>This is where Rust’s ownership comes in. It assigns an abstract <a href="https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html">lifetime</a> to each reference, which is the scope in which the reference is valid. In the above example, the <code>x</code> reference is taken from the <code>z</code> array, so it becomes invalid after <code>z</code> goes out of scope. When you <a href="https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2018&amp;gist=28180d8de7b62c6b4a681a7b1f745a48">run the above example on the playground</a> you see that the Rust compiler indeed throws an error:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0597]: `z[_]` does not live long enough </span><span> --&gt; src/main.rs:4:9 </span><span> | </span><span>2 | let x = { </span><span> | - borrow later stored here </span><span>3 | let z = Box::new([1,2,3]); </span><span>4 | &amp;z[1] </span><span> | ^^^^^ borrowed value does not live long enough </span><span>5 | }; // z goes out of scope and `deallocate` is called </span><span> | - `z[_]` dropped here while still borrowed </span></code></pre> <p>The terminology can be a bit confusing at first. Taking a reference to a value is called <em>borrowing</em> the value since it’s similar to a borrow in real life: You have temporary access to an object but need to return it sometime, and you must not destroy it. By checking that all borrows end before an object is destroyed, the Rust compiler can guarantee that no use-after-free situation can occur.</p> <p>Rust’s ownership system goes even further, preventing not only use-after-free bugs but also providing complete <a href="https://en.wikipedia.org/wiki/Memory_safety"><em>memory safety</em></a>, as garbage collected languages like Java or Python do. Additionally, it guarantees <a href="https://en.wikipedia.org/wiki/Thread_safety"><em>thread safety</em></a> and is thus even safer than those languages in multi-threaded code. And most importantly, all these checks happen at compile time, so there is no runtime overhead compared to hand-written memory management in C.</p> <h3 id="use-cases"><a class="zola-anchor" href="#use-cases" aria-label="Anchor link for: use-cases">🔗</a>Use Cases</h3> <p>We now know the basics of dynamic memory allocation in Rust, but when should we use it? We’ve come really far with our kernel without dynamic memory allocation, so why do we need it now?</p> <p>First, dynamic memory allocation always comes with a bit of performance overhead since we need to find a free slot on the heap for every allocation. For this reason, local variables are generally preferable, especially in performance-sensitive kernel code. However, there are cases where dynamic memory allocation is the best choice.</p> <p>As a basic rule, dynamic memory is required for variables that have a dynamic lifetime or a variable size. The most important type with a dynamic lifetime is <a href="https://doc.rust-lang.org/alloc/rc/index.html"><strong><code>Rc</code></strong></a>, which counts the references to its wrapped value and deallocates it after all references have gone out of scope. Examples for types with a variable size are <a href="https://doc.rust-lang.org/alloc/vec/index.html"><strong><code>Vec</code></strong></a>, <a href="https://doc.rust-lang.org/alloc/string/index.html"><strong><code>String</code></strong></a>, and other <a href="https://doc.rust-lang.org/alloc/collections/index.html">collection types</a> that dynamically grow when more elements are added. These types work by allocating a larger amount of memory when they become full, copying all elements over, and then deallocating the old allocation.</p> <p>For our kernel, we will mostly need the collection types, for example, to store a list of active tasks when implementing multitasking in future posts.</p> <h2 id="the-allocator-interface"><a class="zola-anchor" href="#the-allocator-interface" aria-label="Anchor link for: the-allocator-interface">🔗</a>The Allocator Interface</h2> <p>The first step in implementing a heap allocator is to add a dependency on the built-in <a href="https://doc.rust-lang.org/alloc/"><code>alloc</code></a> crate. Like the <a href="https://doc.rust-lang.org/core/"><code>core</code></a> crate, it is a subset of the standard library that additionally contains the allocation and collection types. To add the dependency on <code>alloc</code>, we add the following to our <code>lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">extern crate</span><span> alloc; </span></code></pre> <p>Contrary to normal dependencies, we don’t need to modify the <code>Cargo.toml</code>. The reason is that the <code>alloc</code> crate ships with the Rust compiler as part of the standard library, so the compiler already knows about the crate. By adding this <code>extern crate</code> statement, we specify that the compiler should try to include it. (Historically, all dependencies needed an <code>extern crate</code> statement, which is now optional).</p> <p>Since we are compiling for a custom target, we can’t use the precompiled version of <code>alloc</code> that is shipped with the Rust installation. Instead, we have to tell cargo to recompile the crate from source. We can do that by adding it to the <code>unstable.build-std</code> array in our <code>.cargo/config.toml</code> file:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in .cargo/config.toml </span><span> </span><span>[</span><span style="color:#808080;">unstable</span><span>] </span><span style="color:#569cd6;">build-std </span><span>= [</span><span style="color:#d69d85;">&quot;core&quot;</span><span>, </span><span style="color:#d69d85;">&quot;compiler_builtins&quot;</span><span>, </span><span style="color:#d69d85;">&quot;alloc&quot;</span><span>] </span></code></pre> <p>Now the compiler will recompile and include the <code>alloc</code> crate in our kernel.</p> <p>The reason that the <code>alloc</code> crate is disabled by default in <code>#[no_std]</code> crates is that it has additional requirements. When we try to compile our project now, we will see these requirements as errors:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: no global memory allocator found but one is required; link to std or add </span><span> #[global_allocator] to a static item that implements the GlobalAlloc trait. </span></code></pre> <p>The error occurs because the <code>alloc</code> crate requires a heap allocator, which is an object that provides the <code>allocate</code> and <code>deallocate</code> functions. In Rust, heap allocators are described by the <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html"><code>GlobalAlloc</code></a> trait, which is mentioned in the error message. To set the heap allocator for the crate, the <code>#[global_allocator]</code> attribute must be applied to a <code>static</code> variable that implements the <code>GlobalAlloc</code> trait.</p> <h3 id="the-globalalloc-trait"><a class="zola-anchor" href="#the-globalalloc-trait" aria-label="Anchor link for: the-globalalloc-trait">🔗</a>The <code>GlobalAlloc</code> Trait</h3> <p>The <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html"><code>GlobalAlloc</code></a> trait defines the functions that a heap allocator must provide. The trait is special because it is almost never used directly by the programmer. Instead, the compiler will automatically insert the appropriate calls to the trait methods when using the allocation and collection types of <code>alloc</code>.</p> <p>Since we will need to implement the trait for all our allocator types, it is worth taking a closer look at its declaration:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub unsafe trait </span><span>GlobalAlloc { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8</span><span>; </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc_zeroed(</span><span style="color:#569cd6;">&amp;</span><span>self, layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span style="color:#569cd6;">... </span><span>} </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>realloc( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>self, </span><span> ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, </span><span> layout: Layout, </span><span> new_size: </span><span style="color:#569cd6;">usize </span><span> ) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span style="color:#569cd6;">... </span><span>} </span><span>} </span></code></pre> <p>It defines the two required methods <a href="https://doc.rust-lang.org/alloc/"><code>alloc</code></a> and <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc"><code>dealloc</code></a>, which correspond to the <code>allocate</code> and <code>deallocate</code> functions we used in our examples:</p> <ul> <li>The <a href="https://doc.rust-lang.org/alloc/"><code>alloc</code></a> method takes a <a href="https://doc.rust-lang.org/alloc/alloc/struct.Layout.html"><code>Layout</code></a> instance as an argument, which describes the desired size and alignment that the allocated memory should have. It returns a <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#dereferencing-a-raw-pointer">raw pointer</a> to the first byte of the allocated memory block. Instead of an explicit error value, the <code>alloc</code> method returns a null pointer to signal an allocation error. This is a bit non-idiomatic, but it has the advantage that wrapping existing system allocators is easy since they use the same convention.</li> <li>The <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc"><code>dealloc</code></a> method is the counterpart and is responsible for freeing a memory block again. It receives two arguments: the pointer returned by <code>alloc</code> and the <code>Layout</code> that was used for the allocation.</li> </ul> <p>The trait additionally defines the two methods <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.alloc_zeroed"><code>alloc_zeroed</code></a> and <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.realloc"><code>realloc</code></a> with default implementations:</p> <ul> <li>The <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.alloc_zeroed"><code>alloc_zeroed</code></a> method is equivalent to calling <code>alloc</code> and then setting the allocated memory block to zero, which is exactly what the provided default implementation does. An allocator implementation can override the default implementations with a more efficient custom implementation if possible.</li> <li>The <a href="https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.realloc"><code>realloc</code></a> method allows to grow or shrink an allocation. The default implementation allocates a new memory block with the desired size and copies over all the content from the previous allocation. Again, an allocator implementation can probably provide a more efficient implementation of this method, for example by growing/shrinking the allocation in-place if possible.</li> </ul> <h4 id="unsafety"><a class="zola-anchor" href="#unsafety" aria-label="Anchor link for: unsafety">🔗</a>Unsafety</h4> <p>One thing to notice is that both the trait itself and all trait methods are declared as <code>unsafe</code>:</p> <ul> <li>The reason for declaring the trait as <code>unsafe</code> is that the programmer must guarantee that the trait implementation for an allocator type is correct. For example, the <code>alloc</code> method must never return a memory block that is already used somewhere else because this would cause undefined behavior.</li> <li>Similarly, the reason that the methods are <code>unsafe</code> is that the caller must ensure various invariants when calling the methods, for example, that the <code>Layout</code> passed to <code>alloc</code> specifies a non-zero size. This is not really relevant in practice since the methods are normally called directly by the compiler, which ensures that the requirements are met.</li> </ul> <h3 id="a-dummyallocator"><a class="zola-anchor" href="#a-dummyallocator" aria-label="Anchor link for: a-dummyallocator">🔗</a>A <code>DummyAllocator</code></h3> <p>Now that we know what an allocator type should provide, we can create a simple dummy allocator. For that, we create a new <code>allocator</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>allocator; </span></code></pre> <p>Our dummy allocator does the absolute minimum to implement the trait and always returns an error when <code>alloc</code> is called. It looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::alloc::{GlobalAlloc, Layout}; </span><span style="color:#569cd6;">use </span><span>core::ptr::null_mut; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Dummy; </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>GlobalAlloc </span><span style="color:#569cd6;">for </span><span>Dummy { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;</span><span>self, _layout: Layout) -&gt; </span><span style="color:#569cd6;">*mut u8 </span><span>{ </span><span> null_mut() </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;</span><span>self, _ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, _layout: Layout) { </span><span> panic!(</span><span style="color:#d69d85;">&quot;dealloc should be never called&quot;</span><span>) </span><span> } </span><span>} </span></code></pre> <p>The struct does not need any fields, so we create it as a <a href="https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts">zero-sized type</a>. As mentioned above, we always return the null pointer from <code>alloc</code>, which corresponds to an allocation error. Since the allocator never returns any memory, a call to <code>dealloc</code> should never occur. For this reason, we simply panic in the <code>dealloc</code> method. The <code>alloc_zeroed</code> and <code>realloc</code> methods have default implementations, so we don’t need to provide implementations for them.</p> <p>We now have a simple allocator, but we still have to tell the Rust compiler that it should use this allocator. This is where the <code>#[global_allocator]</code> attribute comes in.</p> <h3 id="the-global-allocator-attribute"><a class="zola-anchor" href="#the-global-allocator-attribute" aria-label="Anchor link for: the-global-allocator-attribute">🔗</a>The <code>#[global_allocator]</code> Attribute</h3> <p>The <code>#[global_allocator]</code> attribute tells the Rust compiler which allocator instance it should use as the global heap allocator. The attribute is only applicable to a <code>static</code> that implements the <code>GlobalAlloc</code> trait. Let’s register an instance of our <code>Dummy</code> allocator as the global allocator:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">ALLOCATOR</span><span>: Dummy = Dummy; </span></code></pre> <p>Since the <code>Dummy</code> allocator is a <a href="https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts">zero-sized type</a>, we don’t need to specify any fields in the initialization expression.</p> <p>With this static, the compilation errors should be fixed. Now we can use the allocation and collection types of <code>alloc</code>. For example, we can use a <a href="https://doc.rust-lang.org/alloc/boxed/struct.Box.html"><code>Box</code></a> to allocate a value on the heap:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">extern crate</span><span> alloc; </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::boxed::Box; </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// […] print &quot;Hello World!&quot;, call `init`, create `mapper` and `frame_allocator` </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> x = Box::new(</span><span style="color:#b5cea8;">41</span><span>); </span><span> </span><span> </span><span style="color:#608b4e;">// […] call `test_main` in test mode </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span><span> </span></code></pre> <p>Note that we need to specify the <code>extern crate alloc</code> statement in our <code>main.rs</code> too. This is required because the <code>lib.rs</code> and <code>main.rs</code> parts are treated as separate crates. However, we don’t need to create another <code>#[global_allocator]</code> static because the global allocator applies to all crates in the project. In fact, specifying an additional allocator in another crate would be an error.</p> <p>When we run the above code, we see that a panic occurs:</p> <p><img src="https://os.phil-opp.com/heap-allocation/qemu-dummy-output.png" alt="QEMU printing “panicked at `allocation error: Layout { size_: 4, align_: 4 }, src/lib.rs:89:5”" /></p> <p>The panic occurs because the <code>Box::new</code> function implicitly calls the <code>alloc</code> function of the global allocator. Our dummy allocator always returns a null pointer, so every allocation fails. To fix this, we need to create an allocator that actually returns usable memory.</p> <h2 id="creating-a-kernel-heap"><a class="zola-anchor" href="#creating-a-kernel-heap" aria-label="Anchor link for: creating-a-kernel-heap">🔗</a>Creating a Kernel Heap</h2> <p>Before we can create a proper allocator, we first need to create a heap memory region from which the allocator can allocate memory. To do this, we need to define a virtual memory range for the heap region and then map this region to physical frames. See the <a href="https://os.phil-opp.com/paging-introduction/"><em>“Introduction To Paging”</em></a> post for an overview of virtual memory and page tables.</p> <p>The first step is to define a virtual memory region for the heap. We can choose any virtual address range that we like, as long as it is not already used for a different memory region. Let’s define it as the memory starting at address <code>0x_4444_4444_0000</code> so that we can easily recognize a heap pointer later:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">HEAP_START</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">0x_4444_4444_0000</span><span>; </span><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">HEAP_SIZE</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">100 </span><span>* </span><span style="color:#b5cea8;">1024</span><span>; </span><span style="color:#608b4e;">// 100 KiB </span></code></pre> <p>We set the heap size to 100 KiB for now. If we need more space in the future, we can simply increase it.</p> <p>If we tried to use this heap region now, a page fault would occur since the virtual memory region is not mapped to physical memory yet. To resolve this, we create an <code>init_heap</code> function that maps the heap pages using the <a href="https://os.phil-opp.com/paging-implementation/#using-offsetpagetable"><code>Mapper</code> API</a> that we introduced in the <a href="https://os.phil-opp.com/paging-implementation/"><em>“Paging Implementation”</em></a> post:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::{ </span><span> structures::paging::{ </span><span> mapper::MapToError, FrameAllocator, Mapper, Page, PageTableFlags, Size4KiB, </span><span> }, </span><span> VirtAddr, </span><span>}; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_heap( </span><span> mapper: </span><span style="color:#569cd6;">&amp;mut</span><span> impl Mapper&lt;Size4KiB&gt;, </span><span> frame_allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> impl FrameAllocator&lt;Size4KiB&gt;, </span><span>) -&gt; Result&lt;(), MapToError&lt;Size4KiB&gt;&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> page_range = { </span><span> </span><span style="color:#569cd6;">let</span><span> heap_start = VirtAddr::new(</span><span style="color:#b4cea8;">HEAP_START </span><span style="color:#569cd6;">as u64</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> heap_end = heap_start + </span><span style="color:#b4cea8;">HEAP_SIZE </span><span>- </span><span style="color:#b5cea8;">1</span><span style="color:#569cd6;">u64</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> heap_start_page = Page::containing_address(heap_start); </span><span> </span><span style="color:#569cd6;">let</span><span> heap_end_page = Page::containing_address(heap_end); </span><span> Page::range_inclusive(heap_start_page, heap_end_page) </span><span> }; </span><span> </span><span> </span><span style="color:#569cd6;">for</span><span> page </span><span style="color:#569cd6;">in</span><span> page_range { </span><span> </span><span style="color:#569cd6;">let</span><span> frame = frame_allocator </span><span> .allocate_frame() </span><span> .ok_or(MapToError::FrameAllocationFailed)</span><span style="color:#569cd6;">?</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> flags = PageTableFlags::</span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span>PageTableFlags::</span><span style="color:#b4cea8;">WRITABLE</span><span>; </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> mapper.map_to(page, frame, flags, frame_allocator)</span><span style="color:#569cd6;">?</span><span>.flush() </span><span> }; </span><span> } </span><span> </span><span> Ok(()) </span><span>} </span></code></pre> <p>The function takes mutable references to a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html"><code>Mapper</code></a> and a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html"><code>FrameAllocator</code></a> instance, both limited to 4 KiB pages by using <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/enum.Size4KiB.html"><code>Size4KiB</code></a> as the generic parameter. The return value of the function is a <a href="https://doc.rust-lang.org/core/result/enum.Result.html"><code>Result</code></a> with the unit type <code>()</code> as the success variant and a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html"><code>MapToError</code></a> as the error variant, which is the error type returned by the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to"><code>Mapper::map_to</code></a> method. Reusing the error type makes sense here because the <code>map_to</code> method is the main source of errors in this function.</p> <p>The implementation can be broken down into two parts:</p> <ul> <li> <p><strong>Creating the page range:</strong>: To create a range of the pages that we want to map, we convert the <code>HEAP_START</code> pointer to a <a href="https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html"><code>VirtAddr</code></a> type. Then we calculate the heap end address from it by adding the <code>HEAP_SIZE</code>. We want an inclusive bound (the address of the last byte of the heap), so we subtract 1. Next, we convert the addresses into <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html"><code>Page</code></a> types using the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.containing_address"><code>containing_address</code></a> function. Finally, we create a page range from the start and end pages using the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.range_inclusive"><code>Page::range_inclusive</code></a> function.</p> </li> <li> <p><strong>Mapping the pages:</strong> The second step is to map all pages of the page range we just created. For that, we iterate over these pages using a <code>for</code> loop. For each page, we do the following:</p> <ul> <li> <p>We allocate a physical frame that the page should be mapped to using the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html#tymethod.allocate_frame"><code>FrameAllocator::allocate_frame</code></a> method. This method returns <a href="https://doc.rust-lang.org/core/option/enum.Option.html#variant.None"><code>None</code></a> when there are no more frames left. We deal with that case by mapping it to a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html#variant.FrameAllocationFailed"><code>MapToError::FrameAllocationFailed</code></a> error through the <a href="https://doc.rust-lang.org/core/option/enum.Option.html#method.ok_or"><code>Option::ok_or</code></a> method and then applying the <a href="https://doc.rust-lang.org/edition-guide/rust-2018/error-handling-and-panics/the-question-mark-operator-for-easier-error-handling.html">question mark operator</a> to return early in the case of an error.</p> </li> <li> <p>We set the required <code>PRESENT</code> flag and the <code>WRITABLE</code> flag for the page. With these flags, both read and write accesses are allowed, which makes sense for heap memory.</p> </li> <li> <p>We use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to"><code>Mapper::map_to</code></a> method for creating the mapping in the active page table. The method can fail, so we use the <a href="https://doc.rust-lang.org/edition-guide/rust-2018/error-handling-and-panics/the-question-mark-operator-for-easier-error-handling.html">question mark operator</a> again to forward the error to the caller. On success, the method returns a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html"><code>MapperFlush</code></a> instance that we can use to update the <a href="https://os.phil-opp.com/paging-introduction/#the-translation-lookaside-buffer"><em>translation lookaside buffer</em></a> using the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush"><code>flush</code></a> method.</p> </li> </ul> </li> </ul> <p>The final step is to call this function from our <code>kernel_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::allocator; </span><span style="color:#608b4e;">// new import </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory::{self, BootInfoFrameAllocator}; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> blog_os::init(); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); </span><span> </span><span style="color:#569cd6;">let mut</span><span> mapper = </span><span style="color:#569cd6;">unsafe </span><span>{ memory::init(phys_mem_offset) }; </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> BootInfoFrameAllocator::init(</span><span style="color:#569cd6;">&amp;</span><span>boot_info.memory_map) </span><span> }; </span><span> </span><span> </span><span style="color:#608b4e;">// new </span><span> allocator::init_heap(</span><span style="color:#569cd6;">&amp;mut</span><span> mapper, </span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator) </span><span> .expect(</span><span style="color:#d69d85;">&quot;heap initialization failed&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> x = Box::new(</span><span style="color:#b5cea8;">41</span><span>); </span><span> </span><span> </span><span style="color:#608b4e;">// […] call `test_main` in test mode </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>We show the full function for context here. The only new lines are the <code>blog_os::allocator</code> import and the call to the <code>allocator::init_heap</code> function. In case the <code>init_heap</code> function returns an error, we panic using the <a href="https://doc.rust-lang.org/core/result/enum.Result.html#method.expect"><code>Result::expect</code></a> method since there is currently no sensible way for us to handle this error.</p> <p>We now have a mapped heap memory region that is ready to be used. The <code>Box::new</code> call still uses our old <code>Dummy</code> allocator, so you will still see the “out of memory” error when you run it. Let’s fix this by using a proper allocator.</p> <h2 id="using-an-allocator-crate"><a class="zola-anchor" href="#using-an-allocator-crate" aria-label="Anchor link for: using-an-allocator-crate">🔗</a>Using an Allocator Crate</h2> <p>Since implementing an allocator is somewhat complex, we start by using an external allocator crate. We will learn how to implement our own allocator in the next post.</p> <p>A simple allocator crate for <code>no_std</code> applications is the <a href="https://github.com/phil-opp/linked-list-allocator/"><code>linked_list_allocator</code></a> crate. Its name comes from the fact that it uses a linked list data structure to keep track of deallocated memory regions. See the next post for a more detailed explanation of this approach.</p> <p>To use the crate, we first need to add a dependency on it in our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">linked_list_allocator </span><span>= </span><span style="color:#d69d85;">&quot;0.9.0&quot; </span></code></pre> <p>Then we can replace our dummy allocator with the allocator provided by the crate:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>linked_list_allocator::LockedHeap; </span><span> </span><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">ALLOCATOR</span><span>: LockedHeap = LockedHeap::empty(); </span></code></pre> <p>The struct is named <code>LockedHeap</code> because it uses the <a href="https://docs.rs/spinning_top/0.1.0/spinning_top/type.Spinlock.html"><code>spinning_top::Spinlock</code></a> type for synchronization. This is required because multiple threads could access the <code>ALLOCATOR</code> static at the same time. As always, when using a spinlock or a mutex, we need to be careful to not accidentally cause a deadlock. This means that we shouldn’t perform any allocations in interrupt handlers, since they can run at an arbitrary time and might interrupt an in-progress allocation.</p> <p>Setting the <code>LockedHeap</code> as global allocator is not enough. The reason is that we use the <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.LockedHeap.html#method.empty"><code>empty</code></a> constructor function, which creates an allocator without any backing memory. Like our dummy allocator, it always returns an error on <code>alloc</code>. To fix this, we need to initialize the allocator after creating the heap:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/allocator.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_heap( </span><span> mapper: </span><span style="color:#569cd6;">&amp;mut</span><span> impl Mapper&lt;Size4KiB&gt;, </span><span> frame_allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> impl FrameAllocator&lt;Size4KiB&gt;, </span><span>) -&gt; Result&lt;(), MapToError&lt;Size4KiB&gt;&gt; { </span><span> </span><span style="color:#608b4e;">// […] map all heap pages to physical frames </span><span> </span><span> </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">ALLOCATOR</span><span>.lock().init(</span><span style="color:#b4cea8;">HEAP_START</span><span>, </span><span style="color:#b4cea8;">HEAP_SIZE</span><span>); </span><span> } </span><span> </span><span> Ok(()) </span><span>} </span></code></pre> <p>We use the <a href="https://docs.rs/lock_api/0.3.3/lock_api/struct.Mutex.html#method.lock"><code>lock</code></a> method on the inner spinlock of the <code>LockedHeap</code> type to get an exclusive reference to the wrapped <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html"><code>Heap</code></a> instance, on which we then call the <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init"><code>init</code></a> method with the heap bounds as arguments. Because the <a href="https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init"><code>init</code></a> function already tries to write to the heap memory, we must initialize the heap only <em>after</em> mapping the heap pages.</p> <p>After initializing the heap, we can now use all allocation and collection types of the built-in <a href="https://doc.rust-lang.org/alloc/"><code>alloc</code></a> crate without error:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::{boxed::Box, vec, vec::Vec, rc::Rc}; </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// […] initialize interrupts, mapper, frame_allocator, heap </span><span> </span><span> </span><span style="color:#608b4e;">// allocate a number on the heap </span><span> </span><span style="color:#569cd6;">let</span><span> heap_value = Box::new(</span><span style="color:#b5cea8;">41</span><span>); </span><span> println!(</span><span style="color:#d69d85;">&quot;heap_value at </span><span style="color:#b4cea8;">{:p}</span><span style="color:#d69d85;">&quot;</span><span>, heap_value); </span><span> </span><span> </span><span style="color:#608b4e;">// create a dynamically sized vector </span><span> </span><span style="color:#569cd6;">let mut</span><span> vec = Vec::new(); </span><span> </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">500 </span><span>{ </span><span> vec.push(i); </span><span> } </span><span> println!(</span><span style="color:#d69d85;">&quot;vec at </span><span style="color:#b4cea8;">{:p}</span><span style="color:#d69d85;">&quot;</span><span>, vec.as_slice()); </span><span> </span><span> </span><span style="color:#608b4e;">// create a reference counted vector -&gt; will be freed when count reaches 0 </span><span> </span><span style="color:#569cd6;">let</span><span> reference_counted = Rc::new(vec![</span><span style="color:#b5cea8;">1</span><span>, </span><span style="color:#b5cea8;">2</span><span>, </span><span style="color:#b5cea8;">3</span><span>]); </span><span> </span><span style="color:#569cd6;">let</span><span> cloned_reference = reference_counted.clone(); </span><span> println!(</span><span style="color:#d69d85;">&quot;current reference count is </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, Rc::strong_count(</span><span style="color:#569cd6;">&amp;</span><span>cloned_reference)); </span><span> core::mem::drop(reference_counted); </span><span> println!(</span><span style="color:#d69d85;">&quot;reference count is </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;"> now&quot;</span><span>, Rc::strong_count(</span><span style="color:#569cd6;">&amp;</span><span>cloned_reference)); </span><span> </span><span> </span><span style="color:#608b4e;">// […] call `test_main` in test context </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>This code example shows some uses of the <a href="https://doc.rust-lang.org/alloc/boxed/struct.Box.html"><code>Box</code></a>, <a href="https://doc.rust-lang.org/alloc/vec/"><code>Vec</code></a>, and <a href="https://doc.rust-lang.org/alloc/rc/"><code>Rc</code></a> types. For the <code>Box</code> and <code>Vec</code> types, we print the underlying heap pointers using the <a href="https://doc.rust-lang.org/core/fmt/trait.Pointer.html"><code>{:p}</code> formatting specifier</a>. To showcase <code>Rc</code>, we create a reference-counted heap value and use the <a href="https://doc.rust-lang.org/alloc/rc/struct.Rc.html#method.strong_count"><code>Rc::strong_count</code></a> function to print the current reference count before and after dropping an instance (using <a href="https://doc.rust-lang.org/core/mem/fn.drop.html"><code>core::mem::drop</code></a>).</p> <p>When we run it, we see the following:</p> <p><img src="https://os.phil-opp.com/heap-allocation/qemu-alloc-showcase.png" alt="QEMU printing ` heap_value at 0x444444440000 vec at 0x4444444408000 current reference count is 2 reference count is 1 now " /></p> <p>As expected, we see that the <code>Box</code> and <code>Vec</code> values live on the heap, as indicated by the pointer starting with the <code>0x_4444_4444_*</code> prefix. The reference counted value also behaves as expected, with the reference count being 2 after the <code>clone</code> call, and 1 again after one of the instances was dropped.</p> <p>The reason that the vector starts at offset <code>0x800</code> is not that the boxed value is <code>0x800</code> bytes large, but the <a href="https://doc.rust-lang.org/alloc/vec/struct.Vec.html#capacity-and-reallocation">reallocations</a> that occur when the vector needs to increase its capacity. For example, when the vector’s capacity is 32 and we try to add the next element, the vector allocates a new backing array with a capacity of 64 behind the scenes and copies all elements over. Then it frees the old allocation.</p> <p>Of course, there are many more allocation and collection types in the <code>alloc</code> crate that we can now all use in our kernel, including:</p> <ul> <li>the thread-safe reference counted pointer <a href="https://doc.rust-lang.org/alloc/sync/struct.Arc.html"><code>Arc</code></a></li> <li>the owned string type <a href="https://doc.rust-lang.org/alloc/string/struct.String.html"><code>String</code></a> and the <a href="https://doc.rust-lang.org/alloc/macro.format.html"><code>format!</code></a> macro</li> <li><a href="https://doc.rust-lang.org/alloc/collections/linked_list/struct.LinkedList.html"><code>LinkedList</code></a></li> <li>the growable ring buffer <a href="https://doc.rust-lang.org/alloc/collections/vec_deque/struct.VecDeque.html"><code>VecDeque</code></a></li> <li>the <a href="https://doc.rust-lang.org/alloc/collections/binary_heap/struct.BinaryHeap.html"><code>BinaryHeap</code></a> priority queue</li> <li><a href="https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html"><code>BTreeMap</code></a> and <a href="https://doc.rust-lang.org/alloc/collections/btree_set/struct.BTreeSet.html"><code>BTreeSet</code></a></li> </ul> <p>These types will become very useful when we want to implement thread lists, scheduling queues, or support for async/await.</p> <h2 id="adding-a-test"><a class="zola-anchor" href="#adding-a-test" aria-label="Anchor link for: adding-a-test">🔗</a>Adding a Test</h2> <p>To ensure that we don’t accidentally break our new allocation code, we should add an integration test for it. We start by creating a new <code>tests/heap_allocation.rs</code> file with the following content:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/heap_allocation.rs </span><span> </span><span>#![no_std] </span><span>#![no_main] </span><span>#![feature(custom_test_frameworks)] </span><span>#![test_runner(blog_os::test_runner)] </span><span>#![reexport_test_harness_main </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;test_main&quot;</span><span>] </span><span> </span><span style="color:#569cd6;">extern crate</span><span> alloc; </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::{entry_point, BootInfo}; </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span>entry_point!(main); </span><span> </span><span style="color:#569cd6;">fn </span><span>main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> unimplemented!(); </span><span>} </span><span> </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> blog_os::test_panic_handler(info) </span><span>} </span></code></pre> <p>We reuse the <code>test_runner</code> and <code>test_panic_handler</code> functions from our <code>lib.rs</code>. Since we want to test allocations, we enable the <code>alloc</code> crate through the <code>extern crate alloc</code> statement. For more information about the test boilerplate, check out the <a href="https://os.phil-opp.com/testing/"><em>Testing</em></a> post.</p> <p>The implementation of the <code>main</code> function looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/heap_allocation.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::allocator; </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory::{self, BootInfoFrameAllocator}; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::VirtAddr; </span><span> </span><span> blog_os::init(); </span><span> </span><span style="color:#569cd6;">let</span><span> phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); </span><span> </span><span style="color:#569cd6;">let mut</span><span> mapper = </span><span style="color:#569cd6;">unsafe </span><span>{ memory::init(phys_mem_offset) }; </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> BootInfoFrameAllocator::init(</span><span style="color:#569cd6;">&amp;</span><span>boot_info.memory_map) </span><span> }; </span><span> allocator::init_heap(</span><span style="color:#569cd6;">&amp;mut</span><span> mapper, </span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator) </span><span> .expect(</span><span style="color:#d69d85;">&quot;heap initialization failed&quot;</span><span>); </span><span> </span><span> test_main(); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>It is very similar to the <code>kernel_main</code> function in our <code>main.rs</code>, with the differences that we don’t invoke <code>println</code>, don’t include any example allocations, and call <code>test_main</code> unconditionally.</p> <p>Now we’re ready to add a few test cases. First, we add a test that performs some simple allocations using <a href="https://doc.rust-lang.org/alloc/boxed/struct.Box.html"><code>Box</code></a> and checks the allocated values to ensure that basic allocations work:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/heap_allocation.rs </span><span style="color:#569cd6;">use </span><span>alloc::boxed::Box; </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>simple_allocation() { </span><span> </span><span style="color:#569cd6;">let</span><span> heap_value_1 = Box::new(</span><span style="color:#b5cea8;">41</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> heap_value_2 = Box::new(</span><span style="color:#b5cea8;">13</span><span>); </span><span> assert_eq!(*heap_value_1, </span><span style="color:#b5cea8;">41</span><span>); </span><span> assert_eq!(*heap_value_2, </span><span style="color:#b5cea8;">13</span><span>); </span><span>} </span></code></pre> <p>Most importantly, this test verifies that no allocation error occurs.</p> <p>Next, we iteratively build a large vector, to test both large allocations and multiple allocations (due to reallocations):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/heap_allocation.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::vec::Vec; </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>large_vec() { </span><span> </span><span style="color:#569cd6;">let</span><span> n = </span><span style="color:#b5cea8;">1000</span><span>; </span><span> </span><span style="color:#569cd6;">let mut</span><span> vec = Vec::new(); </span><span> </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span>n { </span><span> vec.push(i); </span><span> } </span><span> assert_eq!(vec.iter().sum::&lt;</span><span style="color:#569cd6;">u64</span><span>&gt;(), (n - </span><span style="color:#b5cea8;">1</span><span>) * n / </span><span style="color:#b5cea8;">2</span><span>); </span><span>} </span></code></pre> <p>We verify the sum by comparing it with the formula for the <a href="https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF#Partial_sums">n-th partial sum</a>. This gives us some confidence that the allocated values are all correct.</p> <p>As a third test, we create ten thousand allocations after each other:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/heap_allocation.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::allocator::</span><span style="color:#b4cea8;">HEAP_SIZE</span><span>; </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>many_boxes() { </span><span> </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">HEAP_SIZE </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> x = Box::new(i); </span><span> assert_eq!(*x, i); </span><span> } </span><span>} </span></code></pre> <p>This test ensures that the allocator reuses freed memory for subsequent allocations since it would run out of memory otherwise. This might seem like an obvious requirement for an allocator, but there are allocator designs that don’t do this. An example is the bump allocator design that will be explained in the next post.</p> <p>Let’s run our new integration test:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test --test heap_allocation </span><span>[…] </span><span>Running 3 tests </span><span>simple_allocation... [ok] </span><span>large_vec... [ok] </span><span>many_boxes... [ok] </span></code></pre> <p>All three tests succeeded! You can also invoke <code>cargo test</code> (without the <code>--test</code> argument) to run all unit and integration tests.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>This post gave an introduction to dynamic memory and explained why and where it is needed. We saw how Rust’s borrow checker prevents common vulnerabilities and learned how Rust’s allocation API works.</p> <p>After creating a minimal implementation of Rust’s allocator interface using a dummy allocator, we created a proper heap memory region for our kernel. For that, we defined a virtual address range for the heap and then mapped all pages of that range to physical frames using the <code>Mapper</code> and <code>FrameAllocator</code> from the previous post.</p> <p>Finally, we added a dependency on the <code>linked_list_allocator</code> crate to add a proper allocator to our kernel. With this allocator, we were able to use <code>Box</code>, <code>Vec</code>, and other allocation and collection types from the <code>alloc</code> crate.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>While we already added heap allocation support in this post, we left most of the work to the <code>linked_list_allocator</code> crate. The next post will show in detail how an allocator can be implemented from scratch. It will present multiple possible allocator designs, show how to implement simple versions of them, and explain their advantages and drawbacks.</p> Updates in May 2019 Mon, 03 Jun 2019 00:00:00 +0000 https://os.phil-opp.com/status-update/2019-06-03/ https://os.phil-opp.com/status-update/2019-06-03/ <p>This post gives an overview of the recent updates to the <em>Writing an OS in Rust</em> blog and to the used tools. I was quite busy with my master thesis this month, so I didn’t have the time to create new content or major new features. However, there were quite a few minor updates.</p> <h2 id="x86-64">x86_64</h2> <ul> <li><a href="https://github.com/rust-osdev/x86_64/pull/70">Use cast crate instead of usize_conversions crate</a> (released as version 0.5.5).</li> <li><a href="https://github.com/rust-osdev/x86_64/pull/71">Make FrameAllocator an unsafe trait</a> (released as version 0.6.0).</li> <li><a href="https://github.com/rust-osdev/x86_64/pull/76">Change Port::read and PortReadOnly::read to take &amp;mut self</a> (released as version 0.7.0).</li> <li><a href="https://github.com/npmccallum">@npmccallum</a> started working on <a href="https://github.com/rust-osdev/x86_64/issues/72">moving the type declarations to a separate crate</a> to make them usable for more projects. We created the experimental <a href="https://github.com/rust-osdev/x86_64_types/">x86_64_types</a> crate for this.</li> </ul> <h2 id="cargo-xbuild">Cargo-Xbuild</h2> <ul> <li><a href="https://github.com/rust-osdev/cargo-xbuild/commit/bd73f5a1b975f1938abd5b4c17a048d2018741b7">Make backtraces optional</a> to remove the transitive dependency on the <code>cc</code> crate, which has additional <a href="https://github.com/alexcrichton/cc-rs#compile-time-requirements">compile-time requirements</a> (e.g. a working <code>gcc</code> installation). These requirements caused <a href="https://github.com/phil-opp/blog_os/issues/612">problems for some people</a>, so we decided to disable backtraces by default. Released as version 0.5.9.</li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/32">Error when the sysroot path contains spaces</a>: This pull request adds a special error message that points to <a href="https://github.com/rust-lang/cargo/issues/6139">rust-lang/cargo#6139</a> when a sysroot path contains spaces. This should avoid the regular confusion, e.g. <a href="https://github.com/phil-opp/blog_os/issues/464#issuecomment-427793367">here</a>, <a href="https://github.com/phil-opp/blog_os/issues/403#issuecomment-483046786">here</a>, or <a href="https://github.com/phil-opp/blog_os/issues/403#issuecomment-487313363">here</a>.</li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/33">Add a <code>XBUILD_SYSROOT_PATH</code> environment variable to override sysroot path</a>: This feature is useful when the default sysroot path contains a space. Released as version 0.5.10.</li> <li><a href="https://github.com/rust-osdev/cargo-xbuild/pull/34">Fix the new <code>XBUILD_SYSROOT_PATH</code> environment variable</a>. Released as version 0.5.11.</li> <li><a href="https://github.com/rust-osdev/bootimage/pull/40">Update Azure Pipelines CI script</a> <ul> <li>Build all branches instead of just <code>master</code> and the <a href="https://bors.tech/">bors</a> <code>staging</code> branch.</li> <li>Rustup is now included in the official Windows image of Azure Pipelines, so we don’t need to install it again.</li> </ul> </li> </ul> <h2 id="bootloader">Bootloader</h2> <ul> <li><a href="https://github.com/rybot666">@rybot666</a> started working on <a href="https://github.com/rust-osdev/bootloader/issues/24">porting the 16-bit assembly of the bootloader to Rust</a>.</li> </ul> <h2 id="bootimage">Bootimage</h2> <ul> <li><a href="https://github.com/toothbrush7777777">@toothbrush7777777</a> landed a pull request to <a href="https://github.com/rust-osdev/bootimage/pull/39">pad the boot image to a hard disk block size</a>. This is required for booting the image in VirtualBox. Released as version 0.7.4.</li> <li><a href="https://github.com/rust-osdev/bootimage/pull/41">Set <code>XBUILD_SYSROOT_PATH</code> when building bootloader</a>. Released as version 0.7.5.</li> </ul> <h2 id="blog-os">Blog OS</h2> <ul> <li><a href="https://github.com/phil-opp/blog_os/pull/600">Update to version 0.6.0 of x86_64</a>, which made the <code>FrameAllocator</code> trait unsafe to implement.</li> <li><a href="https://github.com/phil-opp/blog_os/pull/604">Use <code>-serial stdio</code> instead of <code>-serial mon:stdio</code></a> as QEMU arguments when testing.</li> <li><a href="https://github.com/phil-opp/blog_os/pull/606">Update x86_64 to version 0.7.0</a>, which changed the <code>Port::read</code> method to take <code>&amp;mut self</code> instead of <code>&amp;self</code>.</li> <li><a href="https://github.com/josephlr">@josephlr</a> <a href="https://github.com/phil-opp/blog_os/pull/609">replaced some leftover tabs with spaces</a>.</li> <li><a href="https://github.com/phil-opp/blog_os/pull/611">Rewrite <code>CompareMessage</code> struct to check the whole string</a>.</li> </ul> Updates in April 2019 Wed, 01 May 2019 00:00:00 +0000 https://os.phil-opp.com/status-update/2019-05-01/ https://os.phil-opp.com/status-update/2019-05-01/ <p>Lot’s of things changed in the <em>Writing an OS in Rust</em> series in the past month, both on the blog itself and in the tools behind the scenes. This post gives an overview of the most important updates.</p> <p>This post is an experiment inspired by <a href="https://this-week-in-rust.org/"><em>This Week in Rust</em></a> and similar series. The goal is to provide a resource that allows following the project more closely and staying up-to-date with the changes in the tools/libraries behind the scenes. If enough people find this useful, I will try to turn this in a semi-regular series.</p> <h2 id="bootloader">Bootloader</h2> <ul> <li>The build system of the bootloader was rewritten to do a proper linking instead of appending the kernel executable manually like before. The relevant pull requests are <a href="https://github.com/rust-osdev/bootloader/pull/51"><em>Rewrite build system</em></a> and <a href="https://github.com/rust-osdev/bootloader/pull/53"><em>Updates for new build system</em></a>. These (breaking) changes were released as version <code>0.5.0</code> (<a href="https://github.com/rust-osdev/bootloader/blob/master/Changelog.md#050">changelog</a>).</li> <li>To make the bootloader work with future versions of <code>bootimage</code>, a <a href="https://github.com/rust-osdev/bootloader/commit/33b8ce6059e90485c56883b23d4834d06ddfd517"><code>package.metadata.bootloader.target</code> key was added</a> to the <code>Cargo.toml</code> of the bootloader. This key specifies the name of the target JSON file, so that <code>bootimage</code> knows which <code>--target</code> argument to pass. This change was released as version <code>0.5.1</code> (<a href="https://github.com/rust-osdev/bootloader/blob/master/Changelog.md#051">changelog</a>)</li> <li>In the <a href="https://github.com/rust-osdev/bootloader/pull/55"><em>Version 0.6.0</em></a> pull request, the <code>#[cfg(not(test))]</code> attribute was removed from the <code>entry_point</code> macro. This makes it possible to use the macro together with <code>cargo xtest</code> and a custom test framework. Since the change is breaking, it was released as version <code>0.6.0</code> (<a href="https://github.com/rust-osdev/bootloader/blob/master/Changelog.md#060">changelog</a>).</li> </ul> <h2 id="bootimage">Bootimage</h2> <ul> <li>The <a href="https://github.com/rust-osdev/bootimage/pull/34"><em>Rewrite bootimage for new bootloader build system</em></a> pull request completely revamped the implementation of the crate. This was released as version <code>0.7.0</code>. See the <a href="https://github.com/rust-osdev/bootimage/blob/master/Changelog.md#070">changelog</a> for a list of changes. <ul> <li>The rewrite had the unintended side-effect that <code>bootimage run</code> no longer ignored executables named <code>test-*</code>, so that an additional <code>--bin</code> argument was required for specifying which executable to run. To avoid breaking users of <code>bootimage test</code>, we yanked version <code>0.7.0</code>. After <a href="https://github.com/rust-osdev/bootimage/commit/8746c15bf326cf8438a4e64ffdda332fbe59e30d">fixing the issue</a>, version <code>0.7.1</code> was released (<a href="https://github.com/rust-osdev/bootimage/blob/master/Changelog.md#071">changelog</a>).</li> </ul> </li> <li>The <a href="https://github.com/rust-osdev/bootimage/pull/36"><em>New features for <code>bootimage runner</code></em></a> pull request added support for additional arguments and various functionality for supporting <code>cargo xtest</code>. The changes were released as version <code>0.7.2</code> (<a href="https://github.com/rust-osdev/bootimage/blob/master/Changelog.md#072">changelog</a>).</li> <li>An argument parsing bug that broke the new <code>cargo bootimage</code> subcommand on Windows was <a href="https://github.com/rust-osdev/bootimage/commit/101eb43de403fd9f3cb3f044e2c263356d2c179a">fixed</a>. The fix was released as version <code>0.7.3</code>.</li> </ul> <h2 id="blog-os">Blog OS</h2> <ul> <li>Performed an <a href="https://github.com/phil-opp/blog_os/pull/575"><em>Update to new bootloader 0.5.1 and bootimage 0.7.2</em></a>. Apart from requiring the <code>llvm-tools-preview</code> rustup component, this only changes version numbers.</li> <li>The <a href="https://github.com/phil-opp/blog_os/pull/577"><em>Rewrite the linking section of “A Freestanding Rust Binary”</em></a> pull request updated the first post to compile for the bare-metal <code>thumbv7em-none-eabihf</code> target instead of adding linker arguments for Linux/Windows/macOS.</li> <li>Since the blog came close to the free bandwidth limit of Netlify, we needed to <a href="https://github.com/phil-opp/blog_os/pull/579"><em>Migrate from Netlify to Github Pages</em></a> to avoid additional fees.</li> <li>With the <a href="https://github.com/phil-opp/blog_os/pull/582"><em>Minimal Rust Kernel: Use a runner to make cargo xrun work</em></a> pull request, we integrated the new <code>bootimage runner</code> into the blog. <ul> <li>The required updates to the <code>post-02</code> and <code>post-03</code> branches were performed in the <a href="https://github.com/phil-opp/blog_os/pull/585"><em>Add <code>.cargo/config</code> file to post-02 branch</em></a> and <a href="https://github.com/phil-opp/blog_os/pull/586"><em>Merge the changes from #585 into the post-03 branch</em></a> pull requests.</li> </ul> </li> <li>In the <a href="https://github.com/phil-opp/blog_os/pull/584"><em>New testing post</em></a> pull request, we replaced the previous <a href="https://os.phil-opp.com/unit-testing/"><em>Unit Testing</em></a> and <a href="https://os.phil-opp.com/integration-tests/"><em>Integration Tests</em></a> with the new <a href="https://os.phil-opp.com/testing/"><em>Testing</em></a> post, which uses <code>cargo xtest</code> and a custom test framework for running tests. <ul> <li>The required updates for the <code>post-04</code> branch were performed in the <a href="https://github.com/phil-opp/blog_os/pull/587"><em>Implement code for new testing post in post-xx branches</em></a> pull request. The updates for the other <code>post-*</code> branches were pushed manually to avoid spamming the repository with pull requests. You can find a list of the commits in the pull request description.</li> </ul> </li> <li>The <a href="https://github.com/phil-opp/blog_os/pull/595"><em>Avoid generic impl trait parameters in BootInfoFrameAllocator</em></a> pull request made the <code>BootInfoFrameAllocator</code> non-generic by reconstructing the frame iterator on every allocation. This way, we avoid using a <code>impl Trait</code> type parameter, which makes it <a href="https://github.com/phil-opp/blog_os/issues/593">impossible to store the type in a <code>static</code></a>. See <a href="https://github.com/rust-lang/rust/issues/60367">rust-lang/rust#60367</a> for the fundamental problem.</li> </ul> Testing Sat, 27 Apr 2019 00:00:00 +0000 https://os.phil-opp.com/testing/ https://os.phil-opp.com/testing/ <p>This post explores unit and integration testing in <code>no_std</code> executables. We will use Rust’s support for custom test frameworks to execute test functions inside our kernel. To report the results out of QEMU, we will use different features of QEMU and the <code>bootimage</code> tool.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/testing/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-04"><code>post-04</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="requirements"><a class="zola-anchor" href="#requirements" aria-label="Anchor link for: requirements">🔗</a>Requirements</h2> <p>This post replaces the (now deprecated) <a href="https://os.phil-opp.com/unit-testing/"><em>Unit Testing</em></a> and <a href="https://os.phil-opp.com/integration-tests/"><em>Integration Tests</em></a> posts. It assumes that you have followed the <a href="https://os.phil-opp.com/minimal-rust-kernel/"><em>A Minimal Rust Kernel</em></a> post after 2019-04-27. Mainly, it requires that you have a <code>.cargo/config.toml</code> file that <a href="https://os.phil-opp.com/minimal-rust-kernel/#set-a-default-target">sets a default target</a> and <a href="https://os.phil-opp.com/minimal-rust-kernel/#using-cargo-run">defines a runner executable</a>.</p> <h2 id="testing-in-rust"><a class="zola-anchor" href="#testing-in-rust" aria-label="Anchor link for: testing-in-rust">🔗</a>Testing in Rust</h2> <p>Rust has a <a href="https://doc.rust-lang.org/book/ch11-00-testing.html">built-in test framework</a> that is capable of running unit tests without the need to set anything up. Just create a function that checks some results through assertions and add the <code>#[test]</code> attribute to the function header. Then <code>cargo test</code> will automatically find and execute all test functions of your crate.</p> <p>Unfortunately, it’s a bit more complicated for <code>no_std</code> applications such as our kernel. The problem is that Rust’s test framework implicitly uses the built-in <a href="https://doc.rust-lang.org/test/index.html"><code>test</code></a> library, which depends on the standard library. This means that we can’t use the default test framework for our <code>#[no_std]</code> kernel.</p> <p>We can see this when we try to run <code>cargo test</code> in our project:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test </span><span> Compiling blog_os v0.1.0 (/…/blog_os) </span><span>error[E0463]: can&#39;t find crate for `test` </span></code></pre> <p>Since the <code>test</code> crate depends on the standard library, it is not available for our bare metal target. While porting the <code>test</code> crate to a <code>#[no_std]</code> context <a href="https://github.com/japaric/utest">is possible</a>, it is highly unstable and requires some hacks, such as redefining the <code>panic</code> macro.</p> <h3 id="custom-test-frameworks"><a class="zola-anchor" href="#custom-test-frameworks" aria-label="Anchor link for: custom-test-frameworks">🔗</a>Custom Test Frameworks</h3> <p>Fortunately, Rust supports replacing the default test framework through the unstable <a href="https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html"><code>custom_test_frameworks</code></a> feature. This feature requires no external libraries and thus also works in <code>#[no_std]</code> environments. It works by collecting all functions annotated with a <code>#[test_case]</code> attribute and then invoking a user-specified runner function with the list of tests as an argument. Thus, it gives the implementation maximal control over the test process.</p> <p>The disadvantage compared to the default test framework is that many advanced features, such as <a href="https://doc.rust-lang.org/book/ch11-01-writing-tests.html#checking-for-panics-with-should_panic"><code>should_panic</code> tests</a>, are not available. Instead, it is up to the implementation to provide such features itself if needed. This is ideal for us since we have a very special execution environment where the default implementations of such advanced features probably wouldn’t work anyway. For example, the <code>#[should_panic]</code> attribute relies on stack unwinding to catch the panics, which we disabled for our kernel.</p> <p>To implement a custom test framework for our kernel, we add the following to our <code>main.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#![feature(custom_test_frameworks)] </span><span>#![test_runner(crate::test_runner)] </span><span> </span><span>#[cfg(test)] </span><span style="color:#569cd6;">pub fn </span><span>test_runner(tests: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">&amp;</span><span>dyn Fn()]) { </span><span> println!(</span><span style="color:#d69d85;">&quot;Running </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;"> tests&quot;</span><span>, tests.len()); </span><span> </span><span style="color:#569cd6;">for</span><span> test </span><span style="color:#569cd6;">in</span><span> tests { </span><span> test(); </span><span> } </span><span>} </span></code></pre> <p>Our runner just prints a short debug message and then calls each test function in the list. The argument type <code>&amp;[&amp;dyn Fn()]</code> is a <a href="https://doc.rust-lang.org/std/primitive.slice.html"><em>slice</em></a> of <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/trait-objects.html"><em>trait object</em></a> references of the <a href="https://doc.rust-lang.org/std/ops/trait.Fn.html"><em>Fn()</em></a> trait. It is basically a list of references to types that can be called like a function. Since the function is useless for non-test runs, we use the <code>#[cfg(test)]</code> attribute to include it only for tests.</p> <p>When we run <code>cargo test</code> now, we see that it now succeeds (if it doesn’t, see the note below). However, we still see our “Hello World” instead of the message from our <code>test_runner</code>. The reason is that our <code>_start</code> function is still used as entry point. The custom test frameworks feature generates a <code>main</code> function that calls <code>test_runner</code>, but this function is ignored because we use the <code>#[no_main]</code> attribute and provide our own entry point.</p> <div class = "warning"> <p><strong>Note:</strong> There is currently a bug in cargo that leads to “duplicate lang item” errors on <code>cargo test</code> in some cases. It occurs when you have set <code>panic = "abort"</code> for a profile in your <code>Cargo.toml</code>. Try removing it, then <code>cargo test</code> should work. Alternatively, if that doesn’t work, then add <code>panic-abort-tests = true</code> to the <code>[unstable]</code> section of your <code>.cargo/config.toml</code> file. See the <a href="https://github.com/rust-lang/cargo/issues/7359">cargo issue</a> for more information on this.</p> </div> <p>To fix this, we first need to change the name of the generated function to something different than <code>main</code> through the <code>reexport_test_harness_main</code> attribute. Then we can call the renamed function from our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#![reexport_test_harness_main </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;test_main&quot;</span><span>] </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> #[cfg(test)] </span><span> test_main(); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We set the name of the test framework entry function to <code>test_main</code> and call it from our <code>_start</code> entry point. We use <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html">conditional compilation</a> to add the call to <code>test_main</code> only in test contexts because the function is not generated on a normal run.</p> <p>When we now execute <code>cargo test</code>, we see the “Running 0 tests” message from our <code>test_runner</code> on the screen. We are now ready to create our first test function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>trivial_assertion() { </span><span> print!(</span><span style="color:#d69d85;">&quot;trivial assertion... &quot;</span><span>); </span><span> assert_eq!(</span><span style="color:#b5cea8;">1</span><span>, </span><span style="color:#b5cea8;">1</span><span>); </span><span> println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span>} </span></code></pre> <p>When we run <code>cargo test</code> now, we see the following output:</p> <p><img src="https://os.phil-opp.com/testing/qemu-test-runner-output.png" alt="QEMU printing “Hello World!”, “Running 1 tests”, and “trivial assertion… [ok]”" /></p> <p>The <code>tests</code> slice passed to our <code>test_runner</code> function now contains a reference to the <code>trivial_assertion</code> function. From the <code>trivial assertion... [ok]</code> output on the screen, we see that the test was called and that it succeeded.</p> <p>After executing the tests, our <code>test_runner</code> returns to the <code>test_main</code> function, which in turn returns to our <code>_start</code> entry point function. At the end of <code>_start</code>, we enter an endless loop because the entry point function is not allowed to return. This is a problem, because we want <code>cargo test</code> to exit after running all tests.</p> <h2 id="exiting-qemu"><a class="zola-anchor" href="#exiting-qemu" aria-label="Anchor link for: exiting-qemu">🔗</a>Exiting QEMU</h2> <p>Right now, we have an endless loop at the end of our <code>_start</code> function and need to close QEMU manually on each execution of <code>cargo test</code>. This is unfortunate because we also want to run <code>cargo test</code> in scripts without user interaction. The clean solution to this would be to implement a proper way to shutdown our OS. Unfortunately, this is relatively complex because it requires implementing support for either the <a href="https://wiki.osdev.org/APM">APM</a> or <a href="https://wiki.osdev.org/ACPI">ACPI</a> power management standard.</p> <p>Luckily, there is an escape hatch: QEMU supports a special <code>isa-debug-exit</code> device, which provides an easy way to exit QEMU from the guest system. To enable it, we need to pass a <code>-device</code> argument to QEMU. We can do so by adding a <code>package.metadata.bootimage.test-args</code> configuration key in our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">package.metadata.bootimage</span><span>] </span><span style="color:#569cd6;">test-args </span><span>= [</span><span style="color:#d69d85;">&quot;-device&quot;</span><span>, </span><span style="color:#d69d85;">&quot;isa-debug-exit,iobase=0xf4,iosize=0x04&quot;</span><span>] </span></code></pre> <p>The <code>bootimage runner</code> appends the <code>test-args</code> to the default QEMU command for all test executables. For a normal <code>cargo run</code>, the arguments are ignored.</p> <p>Together with the device name (<code>isa-debug-exit</code>), we pass the two parameters <code>iobase</code> and <code>iosize</code> that specify the <em>I/O port</em> through which the device can be reached from our kernel.</p> <h3 id="i-o-ports"><a class="zola-anchor" href="#i-o-ports" aria-label="Anchor link for: i-o-ports">🔗</a>I/O Ports</h3> <p>There are two different approaches for communicating between the CPU and peripheral hardware on x86, <strong>memory-mapped I/O</strong> and <strong>port-mapped I/O</strong>. We already used memory-mapped I/O for accessing the <a href="https://os.phil-opp.com/vga-text-mode/">VGA text buffer</a> through the memory address <code>0xb8000</code>. This address is not mapped to RAM but to some memory on the VGA device.</p> <p>In contrast, port-mapped I/O uses a separate I/O bus for communication. Each connected peripheral has one or more port numbers. To communicate with such an I/O port, there are special CPU instructions called <code>in</code> and <code>out</code>, which take a port number and a data byte (there are also variations of these commands that allow sending a <code>u16</code> or <code>u32</code>).</p> <p>The <code>isa-debug-exit</code> device uses port-mapped I/O. The <code>iobase</code> parameter specifies on which port address the device should live (<code>0xf4</code> is a <a href="https://wiki.osdev.org/I/O_Ports#The_list">generally unused</a> port on the x86’s IO bus) and the <code>iosize</code> specifies the port size (<code>0x04</code> means four bytes).</p> <h3 id="using-the-exit-device"><a class="zola-anchor" href="#using-the-exit-device" aria-label="Anchor link for: using-the-exit-device">🔗</a>Using the Exit Device</h3> <p>The functionality of the <code>isa-debug-exit</code> device is very simple. When a <code>value</code> is written to the I/O port specified by <code>iobase</code>, it causes QEMU to exit with <a href="https://en.wikipedia.org/wiki/Exit_status">exit status</a> <code>(value &lt;&lt; 1) | 1</code>. So when we write <code>0</code> to the port, QEMU will exit with exit status <code>(0 &lt;&lt; 1) | 1 = 1</code>, and when we write <code>1</code> to the port, it will exit with exit status <code>(1 &lt;&lt; 1) | 1 = 3</code>.</p> <p>Instead of manually invoking the <code>in</code> and <code>out</code> assembly instructions, we use the abstractions provided by the <a href="https://docs.rs/x86_64/0.14.2/x86_64/"><code>x86_64</code></a> crate. To add a dependency on that crate, we add it to the <code>dependencies</code> section in our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">x86_64 </span><span>= </span><span style="color:#d69d85;">&quot;0.14.2&quot; </span></code></pre> <p>Now we can use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html"><code>Port</code></a> type provided by the crate to create an <code>exit_qemu</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[derive(Debug, Clone, Copy, PartialEq, Eq)] </span><span>#[repr(u32)] </span><span style="color:#569cd6;">pub enum </span><span>QemuExitCode { </span><span> Success = </span><span style="color:#b5cea8;">0x10</span><span>, </span><span> Failed = </span><span style="color:#b5cea8;">0x11</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>exit_qemu(exit_code: QemuExitCode) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::new(</span><span style="color:#b5cea8;">0xf4</span><span>); </span><span> port.write(exit_code </span><span style="color:#569cd6;">as u32</span><span>); </span><span> } </span><span>} </span></code></pre> <p>The function creates a new <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html"><code>Port</code></a> at <code>0xf4</code>, which is the <code>iobase</code> of the <code>isa-debug-exit</code> device. Then it writes the passed exit code to the port. We use <code>u32</code> because we specified the <code>iosize</code> of the <code>isa-debug-exit</code> device as 4 bytes. Both operations are unsafe because writing to an I/O port can generally result in arbitrary behavior.</p> <p>To specify the exit status, we create a <code>QemuExitCode</code> enum. The idea is to exit with the success exit code if all tests succeeded and with the failure exit code otherwise. The enum is marked as <code>#[repr(u32)]</code> to represent each variant by a <code>u32</code> integer. We use the exit code <code>0x10</code> for success and <code>0x11</code> for failure. The actual exit codes don’t matter much, as long as they don’t clash with the default exit codes of QEMU. For example, using exit code <code>0</code> for success is not a good idea because it becomes <code>(0 &lt;&lt; 1) | 1 = 1</code> after the transformation, which is the default exit code when QEMU fails to run. So we could not differentiate a QEMU error from a successful test run.</p> <p>We can now update our <code>test_runner</code> to exit QEMU after all tests have run:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>test_runner(tests: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">&amp;</span><span>dyn Fn()]) { </span><span> println!(</span><span style="color:#d69d85;">&quot;Running </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;"> tests&quot;</span><span>, tests.len()); </span><span> </span><span style="color:#569cd6;">for</span><span> test </span><span style="color:#569cd6;">in</span><span> tests { </span><span> test(); </span><span> } </span><span> </span><span style="color:#608b4e;">/// new </span><span> exit_qemu(QemuExitCode::Success); </span><span>} </span></code></pre> <p>When we run <code>cargo test</code> now, we see that QEMU immediately closes after executing the tests. The problem is that <code>cargo test</code> interprets the test as failed even though we passed our <code>Success</code> exit code:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.03s </span><span> Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be </span><span>Building bootloader </span><span> Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) </span><span> Finished release [optimized + debuginfo] target(s) in 1.07s </span><span>Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ </span><span> deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, </span><span> iosize=0x04` </span><span>error: test failed, to rerun pass &#39;--bin blog_os&#39; </span></code></pre> <p>The problem is that <code>cargo test</code> considers all error codes other than <code>0</code> as failure.</p> <h3 id="success-exit-code"><a class="zola-anchor" href="#success-exit-code" aria-label="Anchor link for: success-exit-code">🔗</a>Success Exit Code</h3> <p>To work around this, <code>bootimage</code> provides a <code>test-success-exit-code</code> configuration key that maps a specified exit code to the exit code <code>0</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">package.metadata.bootimage</span><span>] </span><span style="color:#569cd6;">test-args </span><span>= [</span><span style="color:#ff3333;">…</span><span>] </span><span style="color:#569cd6;">test-success-exit-code </span><span>= </span><span style="color:#b5cea8;">33 </span><span style="color:#608b4e;"># (0x10 &lt;&lt; 1) | 1 </span></code></pre> <p>With this configuration, <code>bootimage</code> maps our success exit code to exit code 0, so that <code>cargo test</code> correctly recognizes the success case and does not count the test as failed.</p> <p>Our test runner now automatically closes QEMU and correctly reports the test results. We still see the QEMU window open for a very short time, but it does not suffice to read the results. It would be nice if we could print the test results to the console instead, so we can still see them after QEMU exits.</p> <h2 id="printing-to-the-console"><a class="zola-anchor" href="#printing-to-the-console" aria-label="Anchor link for: printing-to-the-console">🔗</a>Printing to the Console</h2> <p>To see the test output on the console, we need to send the data from our kernel to the host system somehow. There are various ways to achieve this, for example, by sending the data over a TCP network interface. However, setting up a networking stack is quite a complex task, so we will choose a simpler solution instead.</p> <h3 id="serial-port"><a class="zola-anchor" href="#serial-port" aria-label="Anchor link for: serial-port">🔗</a>Serial Port</h3> <p>A simple way to send the data is to use the <a href="https://en.wikipedia.org/wiki/Serial_port">serial port</a>, an old interface standard which is no longer found in modern computers. It is easy to program and QEMU can redirect the bytes sent over serial to the host’s standard output or a file.</p> <p>The chips implementing a serial interface are called <a href="https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter">UARTs</a>. There are <a href="https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#UART_models">lots of UART models</a> on x86, but fortunately the only differences between them are some advanced features we don’t need. The common UARTs today are all compatible with the <a href="https://en.wikipedia.org/wiki/16550_UART">16550 UART</a>, so we will use that model for our testing framework.</p> <p>We will use the <a href="https://docs.rs/uart_16550"><code>uart_16550</code></a> crate to initialize the UART and send data over the serial port. To add it as a dependency, we update our <code>Cargo.toml</code> and <code>main.rs</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">uart_16550 </span><span>= </span><span style="color:#d69d85;">&quot;0.2.0&quot; </span></code></pre> <p>The <code>uart_16550</code> crate contains a <code>SerialPort</code> struct that represents the UART registers, but we still need to construct an instance of it ourselves. For that, we create a new <code>serial</code> module with the following content:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>serial; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/serial.rs </span><span> </span><span style="color:#569cd6;">use </span><span>uart_16550::SerialPort; </span><span style="color:#569cd6;">use </span><span>spin::Mutex; </span><span style="color:#569cd6;">use </span><span>lazy_static::lazy_static; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">pub static ref </span><span style="color:#b4cea8;">SERIAL1</span><span>: Mutex&lt;SerialPort&gt; = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> serial_port = </span><span style="color:#569cd6;">unsafe </span><span>{ SerialPort::new(</span><span style="color:#b5cea8;">0x3F8</span><span>) }; </span><span> serial_port.init(); </span><span> Mutex::new(serial_port) </span><span> }; </span><span>} </span></code></pre> <p>Like with the <a href="https://os.phil-opp.com/vga-text-mode/#lazy-statics">VGA text buffer</a>, we use <code>lazy_static</code> and a spinlock to create a <code>static</code> writer instance. By using <code>lazy_static</code> we can ensure that the <code>init</code> method is called exactly once on its first use.</p> <p>Like the <code>isa-debug-exit</code> device, the UART is programmed using port I/O. Since the UART is more complex, it uses multiple I/O ports for programming different device registers. The unsafe <code>SerialPort::new</code> function expects the address of the first I/O port of the UART as an argument, from which it can calculate the addresses of all needed ports. We’re passing the port address <code>0x3F8</code>, which is the standard port number for the first serial interface.</p> <p>To make the serial port easily usable, we add <code>serial_print!</code> and <code>serial_println!</code> macros:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/serial.rs </span><span> </span><span>#[doc(hidden)] </span><span style="color:#569cd6;">pub fn </span><span>_print(args: ::core::fmt::Arguments) { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#b4cea8;">SERIAL1</span><span>.lock().write_fmt(args).expect(</span><span style="color:#d69d85;">&quot;Printing to serial failed&quot;</span><span>); </span><span>} </span><span> </span><span style="color:#608b4e;">/// Prints to the host through the serial interface. </span><span>#[macro_export] </span><span>macro_rules! serial_print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> $crate::serial::_print(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*)); </span><span> }; </span><span>} </span><span> </span><span style="color:#608b4e;">/// Prints to the host through the serial interface, appending a newline. </span><span>#[macro_export] </span><span>macro_rules! serial_println { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>($crate::serial_print</span><span style="color:#569cd6;">!</span><span>(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>)); </span><span> ($fmt:</span><span style="color:#569cd6;">expr</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::serial_print</span><span style="color:#569cd6;">!</span><span>(concat!($fmt, </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>))); </span><span> ($fmt:</span><span style="color:#569cd6;">expr</span><span>, </span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::serial_print</span><span style="color:#569cd6;">!</span><span>( </span><span> concat!($fmt, </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>), </span><span style="color:#569cd6;">$</span><span>($arg)*)); </span><span>} </span></code></pre> <p>The implementation is very similar to the implementation of our <code>print</code> and <code>println</code> macros. Since the <code>SerialPort</code> type already implements the <a href="https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html"><code>fmt::Write</code></a> trait, we don’t need to provide our own implementation.</p> <p>Now we can print to the serial interface instead of the VGA text buffer in our test code:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(test)] </span><span style="color:#569cd6;">fn </span><span>test_runner(tests: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">&amp;</span><span>dyn Fn()]) { </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Running {} tests&quot;</span><span>, tests.len()); </span><span> […] </span><span>} </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>trivial_assertion() { </span><span> serial_print!(</span><span style="color:#d69d85;">&quot;trivial assertion... &quot;</span><span>); </span><span> assert_eq!(</span><span style="color:#b5cea8;">1</span><span>, </span><span style="color:#b5cea8;">1</span><span>); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span>} </span></code></pre> <p>Note that the <code>serial_println</code> macro lives directly under the root namespace because we used the <code>#[macro_export]</code> attribute, so importing it through <code>use crate::serial::serial_println</code> will not work.</p> <h3 id="qemu-arguments"><a class="zola-anchor" href="#qemu-arguments" aria-label="Anchor link for: qemu-arguments">🔗</a>QEMU Arguments</h3> <p>To see the serial output from QEMU, we need to use the <code>-serial</code> argument to redirect the output to stdout:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">package.metadata.bootimage</span><span>] </span><span style="color:#569cd6;">test-args </span><span>= [ </span><span> </span><span style="color:#d69d85;">&quot;-device&quot;</span><span>, </span><span style="color:#d69d85;">&quot;isa-debug-exit,iobase=0xf4,iosize=0x04&quot;</span><span>, </span><span style="color:#d69d85;">&quot;-serial&quot;</span><span>, </span><span style="color:#d69d85;">&quot;stdio&quot; </span><span>] </span></code></pre> <p>When we run <code>cargo test</code> now, we see the test output directly in the console:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.02s </span><span> Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a </span><span>Building bootloader </span><span> Finished release [optimized + debuginfo] target(s) in 0.02s </span><span>Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ </span><span> deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device </span><span> isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` </span><span>Running 1 tests </span><span>trivial assertion... [ok] </span></code></pre> <p>However, when a test fails, we still see the output inside QEMU because our panic handler still uses <code>println</code>. To simulate this, we can change the assertion in our <code>trivial_assertion</code> test to <code>assert_eq!(0, 1)</code>:</p> <p><img src="https://os.phil-opp.com/testing/qemu-failed-test.png" alt="QEMU printing “Hello World!” and “panicked at ‘assertion failed: (left == right) left: 0, right: 1’, src/main.rs:55:5" /></p> <p>We see that the panic message is still printed to the VGA buffer, while the other test output is printed to the serial port. The panic message is quite useful, so it would be useful to see it in the console too.</p> <h3 id="print-an-error-message-on-panic"><a class="zola-anchor" href="#print-an-error-message-on-panic" aria-label="Anchor link for: print-an-error-message-on-panic">🔗</a>Print an Error Message on Panic</h3> <p>To exit QEMU with an error message on a panic, we can use <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html">conditional compilation</a> to use a different panic handler in testing mode:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#608b4e;">// our existing panic handler </span><span>#[cfg(not(test))] </span><span style="color:#608b4e;">// new attribute </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span style="color:#608b4e;">// our panic handler in test mode </span><span>#[cfg(test)] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[failed]</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Error: {}</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> exit_qemu(QemuExitCode::Failed); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>For our test panic handler, we use <code>serial_println</code> instead of <code>println</code> and then exit QEMU with a failure exit code. Note that we still need an endless <code>loop</code> after the <code>exit_qemu</code> call because the compiler does not know that the <code>isa-debug-exit</code> device causes a program exit.</p> <p>Now QEMU also exits for failed tests and prints a useful error message on the console:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.02s </span><span> Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a </span><span>Building bootloader </span><span> Finished release [optimized + debuginfo] target(s) in 0.02s </span><span>Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ </span><span> deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device </span><span> isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` </span><span>Running 1 tests </span><span>trivial assertion... [failed] </span><span> </span><span>Error: panicked at &#39;assertion failed: `(left == right)` </span><span> left: `0`, </span><span> right: `1`&#39;, src/main.rs:65:5 </span></code></pre> <p>Since we see all test output on the console now, we no longer need the QEMU window that pops up for a short time. So we can hide it completely.</p> <h3 id="hiding-qemu"><a class="zola-anchor" href="#hiding-qemu" aria-label="Anchor link for: hiding-qemu">🔗</a>Hiding QEMU</h3> <p>Since we report out the complete test results using the <code>isa-debug-exit</code> device and the serial port, we don’t need the QEMU window anymore. We can hide it by passing the <code>-display none</code> argument to QEMU:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">package.metadata.bootimage</span><span>] </span><span style="color:#569cd6;">test-args </span><span>= [ </span><span> </span><span style="color:#d69d85;">&quot;-device&quot;</span><span>, </span><span style="color:#d69d85;">&quot;isa-debug-exit,iobase=0xf4,iosize=0x04&quot;</span><span>, </span><span style="color:#d69d85;">&quot;-serial&quot;</span><span>, </span><span style="color:#d69d85;">&quot;stdio&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;-display&quot;</span><span>, </span><span style="color:#d69d85;">&quot;none&quot; </span><span>] </span></code></pre> <p>Now QEMU runs completely in the background and no window gets opened anymore. This is not only less annoying, but also allows our test framework to run in environments without a graphical user interface, such as CI services or <a href="https://en.wikipedia.org/wiki/Secure_Shell">SSH</a> connections.</p> <h3 id="timeouts"><a class="zola-anchor" href="#timeouts" aria-label="Anchor link for: timeouts">🔗</a>Timeouts</h3> <p>Since <code>cargo test</code> waits until the test runner exits, a test that never returns can block the test runner forever. That’s unfortunate, but not a big problem in practice since it’s usually easy to avoid endless loops. In our case, however, endless loops can occur in various situations:</p> <ul> <li>The bootloader fails to load our kernel, which causes the system to reboot endlessly.</li> <li>The BIOS/UEFI firmware fails to load the bootloader, which causes the same endless rebooting.</li> <li>The CPU enters a <code>loop {}</code> statement at the end of some of our functions, for example because the QEMU exit device doesn’t work properly.</li> <li>The hardware causes a system reset, for example when a CPU exception is not caught (explained in a future post).</li> </ul> <p>Since endless loops can occur in so many situations, the <code>bootimage</code> tool sets a timeout of 5 minutes for each test executable by default. If the test does not finish within this time, it is marked as failed and a “Timed Out” error is printed to the console. This feature ensures that tests that are stuck in an endless loop don’t block <code>cargo test</code> forever.</p> <p>You can try it yourself by adding a <code>loop {}</code> statement in the <code>trivial_assertion</code> test. When you run <code>cargo test</code>, you see that the test is marked as timed out after 5 minutes. The timeout duration is <a href="https://github.com/rust-osdev/bootimage#configuration">configurable</a> through a <code>test-timeout</code> key in the Cargo.toml:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">package.metadata.bootimage</span><span>] </span><span style="color:#569cd6;">test-timeout </span><span>= </span><span style="color:#b5cea8;">300 </span><span style="color:#608b4e;"># (in seconds) </span></code></pre> <p>If you don’t want to wait 5 minutes for the <code>trivial_assertion</code> test to time out, you can temporarily decrease the above value.</p> <h3 id="insert-printing-automatically"><a class="zola-anchor" href="#insert-printing-automatically" aria-label="Anchor link for: insert-printing-automatically">🔗</a>Insert Printing Automatically</h3> <p>Our <code>trivial_assertion</code> test currently needs to print its own status information using <code>serial_print!</code>/<code>serial_println!</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>trivial_assertion() { </span><span> serial_print!(</span><span style="color:#d69d85;">&quot;trivial assertion... &quot;</span><span>); </span><span> assert_eq!(</span><span style="color:#b5cea8;">1</span><span>, </span><span style="color:#b5cea8;">1</span><span>); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span>} </span></code></pre> <p>Manually adding these print statements for every test we write is cumbersome, so let’s update our <code>test_runner</code> to print these messages automatically. To do that, we need to create a new <code>Testable</code> trait:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">pub trait </span><span>Testable { </span><span> </span><span style="color:#569cd6;">fn </span><span>run(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; (); </span><span>} </span></code></pre> <p>The trick now is to implement this trait for all types <code>T</code> that implement the <a href="https://doc.rust-lang.org/stable/core/ops/trait.Fn.html"><code>Fn()</code> trait</a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;T&gt; Testable </span><span style="color:#569cd6;">for </span><span>T </span><span style="color:#569cd6;">where </span><span> T: Fn(), </span><span>{ </span><span> </span><span style="color:#569cd6;">fn </span><span>run(</span><span style="color:#569cd6;">&amp;</span><span>self) { </span><span> serial_print!(</span><span style="color:#d69d85;">&quot;{}...</span><span style="color:#e3bbab;">\t</span><span style="color:#d69d85;">&quot;</span><span>, core::any::type_name::&lt;T&gt;()); </span><span> self(); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>We implement the <code>run</code> function by first printing the function name using the <a href="https://doc.rust-lang.org/stable/core/any/fn.type_name.html"><code>any::type_name</code></a> function. This function is implemented directly in the compiler and returns a string description of every type. For functions, the type is their name, so this is exactly what we want in this case. The <code>\t</code> character is the <a href="https://en.wikipedia.org/wiki/Tab_key#Tab_characters">tab character</a>, which adds some alignment to the <code>[ok]</code> messages.</p> <p>After printing the function name, we invoke the test function through <code>self()</code>. This only works because we require that <code>self</code> implements the <code>Fn()</code> trait. After the test function returns, we print <code>[ok]</code> to indicate that the function did not panic.</p> <p>The last step is to update our <code>test_runner</code> to use the new <code>Testable</code> trait:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(test)] </span><span style="color:#569cd6;">pub fn </span><span>test_runner(tests: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">&amp;</span><span>dyn Testable]) { </span><span style="color:#608b4e;">// new </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Running {} tests&quot;</span><span>, tests.len()); </span><span> </span><span style="color:#569cd6;">for</span><span> test </span><span style="color:#569cd6;">in</span><span> tests { </span><span> test.run(); </span><span style="color:#608b4e;">// new </span><span> } </span><span> exit_qemu(QemuExitCode::Success); </span><span>} </span></code></pre> <p>The only two changes are the type of the <code>tests</code> argument from <code>&amp;[&amp;dyn Fn()]</code> to <code>&amp;[&amp;dyn Testable]</code> and the fact that we now call <code>test.run()</code> instead of <code>test()</code>.</p> <p>We can now remove the print statements from our <code>trivial_assertion</code> test since they’re now printed automatically:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>trivial_assertion() { </span><span> assert_eq!(</span><span style="color:#b5cea8;">1</span><span>, </span><span style="color:#b5cea8;">1</span><span>); </span><span>} </span></code></pre> <p>The <code>cargo test</code> output now looks like this:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>Running 1 tests </span><span>blog_os::trivial_assertion... [ok] </span></code></pre> <p>The function name now includes the full path to the function, which is useful when test functions in different modules have the same name. Otherwise, the output looks the same as before, but we no longer need to add print statements to our tests manually.</p> <h2 id="testing-the-vga-buffer"><a class="zola-anchor" href="#testing-the-vga-buffer" aria-label="Anchor link for: testing-the-vga-buffer">🔗</a>Testing the VGA Buffer</h2> <p>Now that we have a working test framework, we can create a few tests for our VGA buffer implementation. First, we create a very simple test to verify that <code>println</code> works without panicking:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>test_println_simple() { </span><span> println!(</span><span style="color:#d69d85;">&quot;test_println_simple output&quot;</span><span>); </span><span>} </span></code></pre> <p>The test just prints something to the VGA buffer. If it finishes without panicking, it means that the <code>println</code> invocation did not panic either.</p> <p>To ensure that no panic occurs even if many lines are printed and lines are shifted off the screen, we can create another test:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>test_println_many() { </span><span> </span><span style="color:#569cd6;">for _ in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">200 </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;test_println_many output&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>We can also create a test function to verify that the printed lines really appear on the screen:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>test_println_output() { </span><span> </span><span style="color:#569cd6;">let</span><span> s = </span><span style="color:#d69d85;">&quot;Some test string that fits on a single line&quot;</span><span>; </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, s); </span><span> </span><span style="color:#569cd6;">for </span><span>(i, c) </span><span style="color:#569cd6;">in</span><span> s.chars().enumerate() { </span><span> </span><span style="color:#569cd6;">let</span><span> screen_char = </span><span style="color:#b4cea8;">WRITER</span><span>.lock().buffer.chars[</span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">2</span><span>][i].read(); </span><span> assert_eq!(</span><span style="color:#569cd6;">char</span><span>::from(screen_char.ascii_character), c); </span><span> } </span><span>} </span></code></pre> <p>The function defines a test string, prints it using <code>println</code>, and then iterates over the screen characters of the static <code>WRITER</code>, which represents the VGA text buffer. Since <code>println</code> prints to the last screen line and then immediately appends a newline, the string should appear on line <code>BUFFER_HEIGHT - 2</code>.</p> <p>By using <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate"><code>enumerate</code></a>, we count the number of iterations in the variable <code>i</code>, which we then use for loading the screen character corresponding to <code>c</code>. By comparing the <code>ascii_character</code> of the screen character with <code>c</code>, we ensure that each character of the string really appears in the VGA text buffer.</p> <p>As you can imagine, we could create many more test functions. For example, a function that tests that no panic occurs when printing very long lines and that they’re wrapped correctly, or a function for testing that newlines, non-printable characters, and non-unicode characters are handled correctly.</p> <p>For the rest of this post, however, we will explain how to create <em>integration tests</em> to test the interaction of different components together.</p> <h2 id="integration-tests"><a class="zola-anchor" href="#integration-tests" aria-label="Anchor link for: integration-tests">🔗</a>Integration Tests</h2> <p>The convention for <a href="https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests">integration tests</a> in Rust is to put them into a <code>tests</code> directory in the project root (i.e., next to the <code>src</code> directory). Both the default test framework and custom test frameworks will automatically pick up and execute all tests in that directory.</p> <p>All integration tests are their own executables and completely separate from our <code>main.rs</code>. This means that each test needs to define its own entry point function. Let’s create an example integration test named <code>basic_boot</code> to see how it works in detail:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/basic_boot.rs </span><span> </span><span>#![no_std] </span><span>#![no_main] </span><span>#![feature(custom_test_frameworks)] </span><span>#![test_runner(crate::test_runner)] </span><span>#![reexport_test_harness_main </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;test_main&quot;</span><span>] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span>#[no_mangle] </span><span style="color:#608b4e;">// don&#39;t mangle the name of this function </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> test_main(); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>test_runner(tests: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">&amp;</span><span>dyn Fn()]) { </span><span> unimplemented!(); </span><span>} </span><span> </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>Since integration tests are separate executables, we need to provide all the crate attributes (<code>no_std</code>, <code>no_main</code>, <code>test_runner</code>, etc.) again. We also need to create a new entry point function <code>_start</code>, which calls the test entry point function <code>test_main</code>. We don’t need any <code>cfg(test)</code> attributes because integration test executables are never built in non-test mode.</p> <p>We use the <a href="https://doc.rust-lang.org/core/macro.unimplemented.html"><code>unimplemented</code></a> macro that always panics as a placeholder for the <code>test_runner</code> function and just <code>loop</code> in the <code>panic</code> handler for now. Ideally, we want to implement these functions exactly as we did in our <code>main.rs</code> using the <code>serial_println</code> macro and the <code>exit_qemu</code> function. The problem is that we don’t have access to these functions since tests are built completely separately from our <code>main.rs</code> executable.</p> <p>If you run <code>cargo test</code> at this stage, you will get an endless loop because the panic handler loops endlessly. You need to use the <code>ctrl+c</code> keyboard shortcut for exiting QEMU.</p> <h3 id="create-a-library"><a class="zola-anchor" href="#create-a-library" aria-label="Anchor link for: create-a-library">🔗</a>Create a Library</h3> <p>To make the required functions available to our integration test, we need to split off a library from our <code>main.rs</code>, which can be included by other crates and integration test executables. To do this, we create a new <code>src/lib.rs</code> file:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/lib.rs </span><span> </span><span>#![no_std] </span><span> </span></code></pre> <p>Like the <code>main.rs</code>, the <code>lib.rs</code> is a special file that is automatically recognized by cargo. The library is a separate compilation unit, so we need to specify the <code>#![no_std]</code> attribute again.</p> <p>To make our library work with <code>cargo test</code>, we need to also move the test functions and attributes from <code>main.rs</code> to <code>lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#![cfg_attr(test, no_main)] </span><span>#![feature(custom_test_frameworks)] </span><span>#![test_runner(crate::test_runner)] </span><span>#![reexport_test_harness_main </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;test_main&quot;</span><span>] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span style="color:#569cd6;">pub trait </span><span>Testable { </span><span> </span><span style="color:#569cd6;">fn </span><span>run(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; (); </span><span>} </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;T&gt; Testable </span><span style="color:#569cd6;">for </span><span>T </span><span style="color:#569cd6;">where </span><span> T: Fn(), </span><span>{ </span><span> </span><span style="color:#569cd6;">fn </span><span>run(</span><span style="color:#569cd6;">&amp;</span><span>self) { </span><span> serial_print!(</span><span style="color:#d69d85;">&quot;{}...</span><span style="color:#e3bbab;">\t</span><span style="color:#d69d85;">&quot;</span><span>, core::any::type_name::&lt;T&gt;()); </span><span> self(); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>test_runner(tests: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">&amp;</span><span>dyn Testable]) { </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Running {} tests&quot;</span><span>, tests.len()); </span><span> </span><span style="color:#569cd6;">for</span><span> test </span><span style="color:#569cd6;">in</span><span> tests { </span><span> test.run(); </span><span> } </span><span> exit_qemu(QemuExitCode::Success); </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>test_panic_handler(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[failed]</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Error: {}</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> exit_qemu(QemuExitCode::Failed); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span style="color:#608b4e;">/// Entry point for `cargo test` </span><span>#[cfg(test)] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> test_main(); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span>#[cfg(test)] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> test_panic_handler(info) </span><span>} </span></code></pre> <p>To make our <code>test_runner</code> available to executables and integration tests, we make it public and don’t apply the <code>cfg(test)</code> attribute to it. We also factor out the implementation of our panic handler into a public <code>test_panic_handler</code> function, so that it is available for executables too.</p> <p>Since our <code>lib.rs</code> is tested independently of our <code>main.rs</code>, we need to add a <code>_start</code> entry point and a panic handler when the library is compiled in test mode. By using the <a href="https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute"><code>cfg_attr</code></a> crate attribute, we conditionally enable the <code>no_main</code> attribute in this case.</p> <p>We also move over the <code>QemuExitCode</code> enum and the <code>exit_qemu</code> function and make them public:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[derive(Debug, Clone, Copy, PartialEq, Eq)] </span><span>#[repr(u32)] </span><span style="color:#569cd6;">pub enum </span><span>QemuExitCode { </span><span> Success = </span><span style="color:#b5cea8;">0x10</span><span>, </span><span> Failed = </span><span style="color:#b5cea8;">0x11</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>exit_qemu(exit_code: QemuExitCode) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::new(</span><span style="color:#b5cea8;">0xf4</span><span>); </span><span> port.write(exit_code </span><span style="color:#569cd6;">as u32</span><span>); </span><span> } </span><span>} </span></code></pre> <p>Now executables and integration tests can import these functions from the library and don’t need to define their own implementations. To also make <code>println</code> and <code>serial_println</code> available, we move the module declarations too:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>serial; </span><span style="color:#569cd6;">pub mod </span><span>vga_buffer; </span></code></pre> <p>We make the modules public to make them usable outside of our library. This is also required for making our <code>println</code> and <code>serial_println</code> macros usable since they use the <code>_print</code> functions of the modules.</p> <p>Now we can update our <code>main.rs</code> to use the library:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#![no_std] </span><span>#![no_main] </span><span>#![feature(custom_test_frameworks)] </span><span>#![test_runner(blog_os::test_runner)] </span><span>#![reexport_test_harness_main </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;test_main&quot;</span><span>] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span style="color:#569cd6;">use </span><span>blog_os::println; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> #[cfg(test)] </span><span> test_main(); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[cfg(not(test))] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span>#[cfg(test)] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> blog_os::test_panic_handler(info) </span><span>} </span></code></pre> <p>The library is usable like a normal external crate. It is called <code>blog_os</code>, like our crate. The above code uses the <code>blog_os::test_runner</code> function in the <code>test_runner</code> attribute and the <code>blog_os::test_panic_handler</code> function in our <code>cfg(test)</code> panic handler. It also imports the <code>println</code> macro to make it available to our <code>_start</code> and <code>panic</code> functions.</p> <p>At this point, <code>cargo run</code> and <code>cargo test</code> should work again. Of course, <code>cargo test</code> still loops endlessly (you can exit with <code>ctrl+c</code>). Let’s fix this by using the required library functions in our integration test.</p> <h3 id="completing-the-integration-test"><a class="zola-anchor" href="#completing-the-integration-test" aria-label="Anchor link for: completing-the-integration-test">🔗</a>Completing the Integration Test</h3> <p>Like our <code>src/main.rs</code>, our <code>tests/basic_boot.rs</code> executable can import types from our new library. This allows us to import the missing components to complete our test:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/basic_boot.rs </span><span> </span><span>#![test_runner(blog_os::test_runner)] </span><span> </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> blog_os::test_panic_handler(info) </span><span>} </span></code></pre> <p>Instead of reimplementing the test runner, we use the <code>test_runner</code> function from our library by changing the <code>#![test_runner(crate::test_runner)]</code> attribute to <code>#![test_runner(blog_os::test_runner)]</code>. We then don’t need the <code>test_runner</code> stub function in <code>basic_boot.rs</code> anymore, so we can remove it. For our <code>panic</code> handler, we call the <code>blog_os::test_panic_handler</code> function like we did in our <code>main.rs</code>.</p> <p>Now <code>cargo test</code> exits normally again. When you run it, you will see that it builds and runs the tests for our <code>lib.rs</code>, <code>main.rs</code>, and <code>basic_boot.rs</code> separately after each other. For the <code>main.rs</code> and the <code>basic_boot</code> integration tests, it reports “Running 0 tests” since these files don’t have any functions annotated with <code>#[test_case]</code>.</p> <p>We can now add tests to our <code>basic_boot.rs</code>. For example, we can test that <code>println</code> works without panicking, like we did in the VGA buffer tests:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/basic_boot.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::println; </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>test_println() { </span><span> println!(</span><span style="color:#d69d85;">&quot;test_println output&quot;</span><span>); </span><span>} </span></code></pre> <p>When we run <code>cargo test</code> now, we see that it finds and executes the test function.</p> <p>The test might seem a bit useless right now since it’s almost identical to one of the VGA buffer tests. However, in the future, the <code>_start</code> functions of our <code>main.rs</code> and <code>lib.rs</code> might grow and call various initialization routines before running the <code>test_main</code> function, so that the two tests are executed in very different environments.</p> <p>By testing <code>println</code> in a <code>basic_boot</code> environment without calling any initialization routines in <code>_start</code>, we can ensure that <code>println</code> works right after booting. This is important because we rely on it, e.g., for printing panic messages.</p> <h3 id="future-tests"><a class="zola-anchor" href="#future-tests" aria-label="Anchor link for: future-tests">🔗</a>Future Tests</h3> <p>The power of integration tests is that they’re treated as completely separate executables. This gives them complete control over the environment, which makes it possible to test that the code interacts correctly with the CPU or hardware devices.</p> <p>Our <code>basic_boot</code> test is a very simple example of an integration test. In the future, our kernel will become much more featureful and interact with the hardware in various ways. By adding integration tests, we can ensure that these interactions work (and keep working) as expected. Some ideas for possible future tests are:</p> <ul> <li><strong>CPU Exceptions</strong>: When the code performs invalid operations (e.g., divides by zero), the CPU throws an exception. The kernel can register handler functions for such exceptions. An integration test could verify that the correct exception handler is called when a CPU exception occurs or that the execution continues correctly after a resolvable exception.</li> <li><strong>Page Tables</strong>: Page tables define which memory regions are valid and accessible. By modifying the page tables, it is possible to allocate new memory regions, for example when launching programs. An integration test could modify the page tables in the <code>_start</code> function and verify that the modifications have the desired effects in <code>#[test_case]</code> functions.</li> <li><strong>Userspace Programs</strong>: Userspace programs are programs with limited access to the system’s resources. For example, they don’t have access to kernel data structures or to the memory of other programs. An integration test could launch userspace programs that perform forbidden operations and verify that the kernel prevents them all.</li> </ul> <p>As you can imagine, many more tests are possible. By adding such tests, we can ensure that we don’t break them accidentally when we add new features to our kernel or refactor our code. This is especially important when our kernel becomes larger and more complex.</p> <h3 id="tests-that-should-panic"><a class="zola-anchor" href="#tests-that-should-panic" aria-label="Anchor link for: tests-that-should-panic">🔗</a>Tests that Should Panic</h3> <p>The test framework of the standard library supports a <a href="https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics"><code>#[should_panic]</code> attribute</a> that allows constructing tests that should fail. This is useful, for example, to verify that a function fails when an invalid argument is passed. Unfortunately, this attribute isn’t supported in <code>#[no_std]</code> crates since it requires support from the standard library.</p> <p>While we can’t use the <code>#[should_panic]</code> attribute in our kernel, we can get similar behavior by creating an integration test that exits with a success error code from the panic handler. Let’s start creating such a test with the name <code>should_panic</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/should_panic.rs </span><span> </span><span>#![no_std] </span><span>#![no_main] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span style="color:#569cd6;">use </span><span>blog_os::{QemuExitCode, exit_qemu, serial_println}; </span><span> </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span> exit_qemu(QemuExitCode::Success); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>This test is still incomplete as it doesn’t define a <code>_start</code> function or any of the custom test runner attributes yet. Let’s add the missing parts:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/should_panic.rs </span><span> </span><span>#![feature(custom_test_frameworks)] </span><span>#![test_runner(test_runner)] </span><span>#![reexport_test_harness_main </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;test_main&quot;</span><span>] </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> test_main(); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>test_runner(tests: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">&amp;</span><span>dyn Fn()]) { </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Running {} tests&quot;</span><span>, tests.len()); </span><span> </span><span style="color:#569cd6;">for</span><span> test </span><span style="color:#569cd6;">in</span><span> tests { </span><span> test(); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[test did not panic]&quot;</span><span>); </span><span> exit_qemu(QemuExitCode::Failed); </span><span> } </span><span> exit_qemu(QemuExitCode::Success); </span><span>} </span></code></pre> <p>Instead of reusing the <code>test_runner</code> from our <code>lib.rs</code>, the test defines its own <code>test_runner</code> function that exits with a failure exit code when a test returns without panicking (we want our tests to panic). If no test function is defined, the runner exits with a success error code. Since the runner always exits after running a single test, it does not make sense to define more than one <code>#[test_case]</code> function.</p> <p>Now we can create a test that should fail:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/should_panic.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::serial_print; </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>should_fail() { </span><span> serial_print!(</span><span style="color:#d69d85;">&quot;should_panic::should_fail...</span><span style="color:#e3bbab;">\t</span><span style="color:#d69d85;">&quot;</span><span>); </span><span> assert_eq!(</span><span style="color:#b5cea8;">0</span><span>, </span><span style="color:#b5cea8;">1</span><span>); </span><span>} </span></code></pre> <p>The test uses <code>assert_eq</code> to assert that <code>0</code> and <code>1</code> are equal. Of course, this fails, so our test panics as desired. Note that we need to manually print the function name using <code>serial_print!</code> here because we don’t use the <code>Testable</code> trait.</p> <p>When we run the test through <code>cargo test --test should_panic</code> we see that it is successful because the test panicked as expected. When we comment out the assertion and run the test again, we see that it indeed fails with the <em>“test did not panic”</em> message.</p> <p>A significant drawback of this approach is that it only works for a single test function. With multiple <code>#[test_case]</code> functions, only the first function is executed because the execution cannot continue after the panic handler has been called. I currently don’t know of a good way to solve this problem, so let me know if you have an idea!</p> <h3 id="no-harness-tests"><a class="zola-anchor" href="#no-harness-tests" aria-label="Anchor link for: no-harness-tests">🔗</a>No Harness Tests</h3> <p>For integration tests that only have a single test function (like our <code>should_panic</code> test), the test runner isn’t really needed. For cases like this, we can disable the test runner completely and run our test directly in the <code>_start</code> function.</p> <p>The key to this is to disable the <code>harness</code> flag for the test in the <code>Cargo.toml</code>, which defines whether a test runner is used for an integration test. When it’s set to <code>false</code>, both the default test runner and the custom test runner feature are disabled, so that the test is treated like a normal executable.</p> <p>Let’s disable the <code>harness</code> flag for our <code>should_panic</code> test:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[[</span><span style="color:#808080;">test</span><span>]] </span><span style="color:#569cd6;">name </span><span>= </span><span style="color:#d69d85;">&quot;should_panic&quot; </span><span style="color:#569cd6;">harness </span><span>= </span><span style="color:#569cd6;">false </span></code></pre> <p>Now we vastly simplify our <code>should_panic</code> test by removing the <code>test_runner</code>-related code. The result looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/should_panic.rs </span><span> </span><span>#![no_std] </span><span>#![no_main] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span style="color:#569cd6;">use </span><span>blog_os::{exit_qemu, serial_print, serial_println, QemuExitCode}; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> should_fail(); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[test did not panic]&quot;</span><span>); </span><span> exit_qemu(QemuExitCode::Failed); </span><span> </span><span style="color:#569cd6;">loop</span><span>{} </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>should_fail() { </span><span> serial_print!(</span><span style="color:#d69d85;">&quot;should_panic::should_fail...</span><span style="color:#e3bbab;">\t</span><span style="color:#d69d85;">&quot;</span><span>); </span><span> assert_eq!(</span><span style="color:#b5cea8;">0</span><span>, </span><span style="color:#b5cea8;">1</span><span>); </span><span>} </span><span> </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span> exit_qemu(QemuExitCode::Success); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We now call the <code>should_fail</code> function directly from our <code>_start</code> function and exit with a failure exit code if it returns. When we run <code>cargo test --test should_panic</code> now, we see that the test behaves exactly as before.</p> <p>Apart from creating <code>should_panic</code> tests, disabling the <code>harness</code> attribute can also be useful for complex integration tests, for example, when the individual test functions have side effects and need to be run in a specified order.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>Testing is a very useful technique to ensure that certain components have the desired behavior. Even if they cannot show the absence of bugs, they’re still a useful tool for finding them and especially for avoiding regressions.</p> <p>This post explained how to set up a test framework for our Rust kernel. We used Rust’s custom test frameworks feature to implement support for a simple <code>#[test_case]</code> attribute in our bare-metal environment. Using the <code>isa-debug-exit</code> device of QEMU, our test runner can exit QEMU after running the tests and report the test status. To print error messages to the console instead of the VGA buffer, we created a basic driver for the serial port.</p> <p>After creating some tests for our <code>println</code> macro, we explored integration tests in the second half of the post. We learned that they live in the <code>tests</code> directory and are treated as completely separate executables. To give them access to the <code>exit_qemu</code> function and the <code>serial_println</code> macro, we moved most of our code into a library that can be imported by all executables and integration tests. Since integration tests run in their own separate environment, they make it possible to test interactions with the hardware or to create tests that should panic.</p> <p>We now have a test framework that runs in a realistic environment inside QEMU. By creating more tests in future posts, we can keep our kernel maintainable when it becomes more complex.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>In the next post, we will explore <em>CPU exceptions</em>. These exceptions are thrown by the CPU when something illegal happens, such as a division by zero or an access to an unmapped memory page (a so-called “page fault”). Being able to catch and examine these exceptions is very important for debugging future errors. Exception handling is also very similar to the handling of hardware interrupts, which is required for keyboard support.</p> Paging Implementation Thu, 14 Mar 2019 00:00:00 +0000 https://os.phil-opp.com/paging-implementation/ https://os.phil-opp.com/paging-implementation/ <p>This post shows how to implement paging support in our kernel. It first explores different techniques to make the physical page table frames accessible to the kernel and discusses their respective advantages and drawbacks. It then implements an address translation function and a function to create a new mapping.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/paging-implementation/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-09"><code>post-09</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="introduction"><a class="zola-anchor" href="#introduction" aria-label="Anchor link for: introduction">🔗</a>Introduction</h2> <p>The <a href="https://os.phil-opp.com/paging-introduction/">previous post</a> gave an introduction to the concept of paging. It motivated paging by comparing it with segmentation, explained how paging and page tables work, and then introduced the 4-level page table design of <code>x86_64</code>. We found out that the bootloader already set up a page table hierarchy for our kernel, which means that our kernel already runs on virtual addresses. This improves safety since illegal memory accesses cause page fault exceptions instead of modifying arbitrary physical memory.</p> <p>The post ended with the problem that we <a href="https://os.phil-opp.com/paging-introduction/#accessing-the-page-tables">can’t access the page tables from our kernel</a> because they are stored in physical memory and our kernel already runs on virtual addresses. This post explores different approaches to making the page table frames accessible to our kernel. We will discuss the advantages and drawbacks of each approach and then decide on an approach for our kernel.</p> <p>To implement the approach, we will need support from the bootloader, so we’ll configure it first. Afterward, we will implement a function that traverses the page table hierarchy in order to translate virtual to physical addresses. Finally, we learn how to create new mappings in the page tables and how to find unused memory frames for creating new page tables.</p> <h2 id="accessing-page-tables"><a class="zola-anchor" href="#accessing-page-tables" aria-label="Anchor link for: accessing-page-tables">🔗</a>Accessing Page Tables</h2> <p>Accessing the page tables from our kernel is not as easy as it may seem. To understand the problem, let’s take a look at the example 4-level page table hierarchy from the previous post again:</p> <p><img src="../paging-introduction/x86_64-page-table-translation.svg" alt="An example 4-level page hierarchy with each page table shown in physical memory" /></p> <p>The important thing here is that each page entry stores the <em>physical</em> address of the next table. This avoids the need to run a translation for these addresses too, which would be bad for performance and could easily cause endless translation loops.</p> <p>The problem for us is that we can’t directly access physical addresses from our kernel since our kernel also runs on top of virtual addresses. For example, when we access address <code>4 KiB</code> we access the <em>virtual</em> address <code>4 KiB</code>, not the <em>physical</em> address <code>4 KiB</code> where the level 4 page table is stored. When we want to access the physical address <code>4 KiB</code>, we can only do so through some virtual address that maps to it.</p> <p>So in order to access page table frames, we need to map some virtual pages to them. There are different ways to create these mappings that all allow us to access arbitrary page table frames.</p> <h3 id="identity-mapping"><a class="zola-anchor" href="#identity-mapping" aria-label="Anchor link for: identity-mapping">🔗</a>Identity Mapping</h3> <p>A simple solution is to <strong>identity map all page tables</strong>:</p> <p><img src="https://os.phil-opp.com/paging-implementation/identity-mapped-page-tables.svg" alt="A virtual and a physical address space with various virtual pages mapped to the physical frame with the same address" /></p> <p>In this example, we see various identity-mapped page table frames. This way, the physical addresses of page tables are also valid virtual addresses so that we can easily access the page tables of all levels starting from the CR3 register.</p> <p>However, it clutters the virtual address space and makes it more difficult to find continuous memory regions of larger sizes. For example, imagine that we want to create a virtual memory region of size 1000 KiB in the above graphic, e.g., for <a href="https://en.wikipedia.org/wiki/Memory-mapped_file">memory-mapping a file</a>. We can’t start the region at <code>28 KiB</code> because it would collide with the already mapped page at <code>1004 KiB</code>. So we have to look further until we find a large enough unmapped area, for example at <code>1008 KiB</code>. This is a similar fragmentation problem as with <a href="https://os.phil-opp.com/paging-introduction/#fragmentation">segmentation</a>.</p> <p>Equally, it makes it much more difficult to create new page tables because we need to find physical frames whose corresponding pages aren’t already in use. For example, let’s assume that we reserved the <em>virtual</em> 1000 KiB memory region starting at <code>1008 KiB</code> for our memory-mapped file. Now we can’t use any frame with a <em>physical</em> address between <code>1000 KiB</code> and <code>2008 KiB</code> anymore, because we can’t identity map it.</p> <h3 id="map-at-a-fixed-offset"><a class="zola-anchor" href="#map-at-a-fixed-offset" aria-label="Anchor link for: map-at-a-fixed-offset">🔗</a>Map at a Fixed Offset</h3> <p>To avoid the problem of cluttering the virtual address space, we can <strong>use a separate memory region for page table mappings</strong>. So instead of identity mapping page table frames, we map them at a fixed offset in the virtual address space. For example, the offset could be 10 TiB:</p> <p><img src="https://os.phil-opp.com/paging-implementation/page-tables-mapped-at-offset.svg" alt="The same figure as for the identity mapping, but each mapped virtual page is offset by 10 TiB." /></p> <p>By using the virtual memory in the range <code>10 TiB..(10 TiB + physical memory size)</code> exclusively for page table mappings, we avoid the collision problems of the identity mapping. Reserving such a large region of the virtual address space is only possible if the virtual address space is much larger than the physical memory size. This isn’t a problem on x86_64 since the 48-bit address space is 256 TiB large.</p> <p>This approach still has the disadvantage that we need to create a new mapping whenever we create a new page table. Also, it does not allow accessing page tables of other address spaces, which would be useful when creating a new process.</p> <h3 id="map-the-complete-physical-memory"><a class="zola-anchor" href="#map-the-complete-physical-memory" aria-label="Anchor link for: map-the-complete-physical-memory">🔗</a>Map the Complete Physical Memory</h3> <p>We can solve these problems by <strong>mapping the complete physical memory</strong> instead of only page table frames:</p> <p><img src="https://os.phil-opp.com/paging-implementation/map-complete-physical-memory.svg" alt="The same figure as for the offset mapping, but every physical frame has a mapping (at 10 TiB + X) instead of only page table frames." /></p> <p>This approach allows our kernel to access arbitrary physical memory, including page table frames of other address spaces. The reserved virtual memory range has the same size as before, with the difference that it no longer contains unmapped pages.</p> <p>The disadvantage of this approach is that additional page tables are needed for storing the mapping of the physical memory. These page tables need to be stored somewhere, so they use up a part of physical memory, which can be a problem on devices with a small amount of memory.</p> <p>On x86_64, however, we can use <a href="https://en.wikipedia.org/wiki/Page_%28computer_memory%29#Multiple_page_sizes">huge pages</a> with a size of 2 MiB for the mapping, instead of the default 4 KiB pages. This way, mapping 32 GiB of physical memory only requires 132 KiB for page tables since only one level 3 table and 32 level 2 tables are needed. Huge pages are also more cache efficient since they use fewer entries in the translation lookaside buffer (TLB).</p> <h3 id="temporary-mapping"><a class="zola-anchor" href="#temporary-mapping" aria-label="Anchor link for: temporary-mapping">🔗</a>Temporary Mapping</h3> <p>For devices with very small amounts of physical memory, we could <strong>map the page table frames only temporarily</strong> when we need to access them. To be able to create the temporary mappings, we only need a single identity-mapped level 1 table:</p> <p><img src="https://os.phil-opp.com/paging-implementation/temporarily-mapped-page-tables.svg" alt="A virtual and a physical address space with an identity mapped level 1 table, which maps its 0th entry to the level 2 table frame, thereby mapping that frame to the page with address 0" /></p> <p>The level 1 table in this graphic controls the first 2 MiB of the virtual address space. This is because it is reachable by starting at the CR3 register and following the 0th entry in the level 4, level 3, and level 2 page tables. The entry with index <code>8</code> maps the virtual page at address <code>32 KiB</code> to the physical frame at address <code>32 KiB</code>, thereby identity mapping the level 1 table itself. The graphic shows this identity-mapping by the horizontal arrow at <code>32 KiB</code>.</p> <p>By writing to the identity-mapped level 1 table, our kernel can create up to 511 temporary mappings (512 minus the entry required for the identity mapping). In the above example, the kernel created two temporary mappings:</p> <ul> <li>By mapping the 0th entry of the level 1 table to the frame with address <code>24 KiB</code>, it created a temporary mapping of the virtual page at <code>0 KiB</code> to the physical frame of the level 2 page table, indicated by the dashed arrow.</li> <li>By mapping the 9th entry of the level 1 table to the frame with address <code>4 KiB</code>, it created a temporary mapping of the virtual page at <code>36 KiB</code> to the physical frame of the level 4 page table, indicated by the dashed arrow.</li> </ul> <p>Now the kernel can access the level 2 page table by writing to page <code>0 KiB</code> and the level 4 page table by writing to page <code>36 KiB</code>.</p> <p>The process for accessing an arbitrary page table frame with temporary mappings would be:</p> <ul> <li>Search for a free entry in the identity-mapped level 1 table.</li> <li>Map that entry to the physical frame of the page table that we want to access.</li> <li>Access the target frame through the virtual page that maps to the entry.</li> <li>Set the entry back to unused, thereby removing the temporary mapping again.</li> </ul> <p>This approach reuses the same 512 virtual pages for creating the mappings and thus requires only 4 KiB of physical memory. The drawback is that it is a bit cumbersome, especially since a new mapping might require modifications to multiple table levels, which means that we would need to repeat the above process multiple times.</p> <h3 id="recursive-page-tables"><a class="zola-anchor" href="#recursive-page-tables" aria-label="Anchor link for: recursive-page-tables">🔗</a>Recursive Page Tables</h3> <p>Another interesting approach, which requires no additional page tables at all, is to <strong>map the page table recursively</strong>. The idea behind this approach is to map an entry from the level 4 page table to the level 4 table itself. By doing this, we effectively reserve a part of the virtual address space and map all current and future page table frames to that space.</p> <p>Let’s go through an example to understand how this all works:</p> <p><img src="https://os.phil-opp.com/paging-implementation/recursive-page-table.png" alt="An example 4-level page hierarchy with each page table shown in physical memory. Entry 511 of the level 4 page is mapped to frame 4KiB, the frame of the level 4 table itself." /></p> <p>The only difference to the <a href="https://os.phil-opp.com/paging-implementation/#accessing-page-tables">example at the beginning of this post</a> is the additional entry at index <code>511</code> in the level 4 table, which is mapped to physical frame <code>4 KiB</code>, the frame of the level 4 table itself.</p> <p>By letting the CPU follow this entry on a translation, it doesn’t reach a level 3 table but the same level 4 table again. This is similar to a recursive function that calls itself, therefore this table is called a <em>recursive page table</em>. The important thing is that the CPU assumes that every entry in the level 4 table points to a level 3 table, so it now treats the level 4 table as a level 3 table. This works because tables of all levels have the exact same layout on x86_64.</p> <p>By following the recursive entry one or multiple times before we start the actual translation, we can effectively shorten the number of levels that the CPU traverses. For example, if we follow the recursive entry once and then proceed to the level 3 table, the CPU thinks that the level 3 table is a level 2 table. Going further, it treats the level 2 table as a level 1 table and the level 1 table as the mapped frame. This means that we can now read and write the level 1 page table because the CPU thinks that it is the mapped frame. The graphic below illustrates the five translation steps:</p> <p><img src="https://os.phil-opp.com/paging-implementation/recursive-page-table-access-level-1.png" alt="The above example 4-level page hierarchy with 5 arrows: “Step 0” from CR4 to level 4 table, “Step 1” from level 4 table to level 4 table, “Step 2” from level 4 table to level 3 table, “Step 3” from level 3 table to level 2 table, and “Step 4” from level 2 table to level 1 table." /></p> <p>Similarly, we can follow the recursive entry twice before starting the translation to reduce the number of traversed levels to two:</p> <p><img src="https://os.phil-opp.com/paging-implementation/recursive-page-table-access-level-2.png" alt="The same 4-level page hierarchy with the following 4 arrows: “Step 0” from CR4 to level 4 table, “Steps 1&amp;2” from level 4 table to level 4 table, “Step 3” from level 4 table to level 3 table, and “Step 4” from level 3 table to level 2 table." /></p> <p>Let’s go through it step by step: First, the CPU follows the recursive entry on the level 4 table and thinks that it reaches a level 3 table. Then it follows the recursive entry again and thinks that it reaches a level 2 table. But in reality, it is still on the level 4 table. When the CPU now follows a different entry, it lands on a level 3 table but thinks it is already on a level 1 table. So while the next entry points to a level 2 table, the CPU thinks that it points to the mapped frame, which allows us to read and write the level 2 table.</p> <p>Accessing the tables of levels 3 and 4 works in the same way. To access the level 3 table, we follow the recursive entry three times, tricking the CPU into thinking it is already on a level 1 table. Then we follow another entry and reach a level 3 table, which the CPU treats as a mapped frame. For accessing the level 4 table itself, we just follow the recursive entry four times until the CPU treats the level 4 table itself as the mapped frame (in blue in the graphic below).</p> <p><img src="https://os.phil-opp.com/paging-implementation/recursive-page-table-access-level-3.png" alt="The same 4-level page hierarchy with the following 3 arrows: “Step 0” from CR4 to level 4 table, “Steps 1,2,3” from level 4 table to level 4 table, and “Step 4” from level 4 table to level 3 table. In blue, the alternative “Steps 1,2,3,4” arrow from level 4 table to level 4 table." /></p> <p>It might take some time to wrap your head around the concept, but it works quite well in practice.</p> <p>In the section below, we explain how to construct virtual addresses for following the recursive entry one or multiple times. We will not use recursive paging for our implementation, so you don’t need to read it to continue with the post. If it interests you, just click on <em>“Address Calculation”</em> to expand it.</p> <hr /> <details> <summary><h4>Address Calculation</h4></summary> <p>We saw that we can access tables of all levels by following the recursive entry once or multiple times before the actual translation. Since the indexes into the tables of the four levels are derived directly from the virtual address, we need to construct special virtual addresses for this technique. Remember, the page table indexes are derived from the address in the following way:</p> <p><img src="../paging-introduction/x86_64-table-indices-from-address.svg" alt="Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index" /></p> <p>Let’s assume that we want to access the level 1 page table that maps a specific page. As we learned above, this means that we have to follow the recursive entry once before continuing with the level 4, level 3, and level 2 indexes. To do that, we move each block of the address one block to the right and set the original level 4 index to the index of the recursive entry:</p> <p><img src="https://os.phil-opp.com/paging-implementation/table-indices-from-address-recursive-level-1.svg" alt="Bits 0–12 are the offset into the level 1 table frame, bits 12–21 the level 2 index, bits 21–30 the level 3 index, bits 30–39 the level 4 index, and bits 39–48 the index of the recursive entry" /></p> <p>For accessing the level 2 table of that page, we move each index block two blocks to the right and set both the blocks of the original level 4 index and the original level 3 index to the index of the recursive entry:</p> <p><img src="https://os.phil-opp.com/paging-implementation/table-indices-from-address-recursive-level-2.svg" alt="Bits 0–12 are the offset into the level 2 table frame, bits 12–21 the level 3 index, bits 21–30 the level 4 index, and bits 30–39 and bits 39–48 are the index of the recursive entry" /></p> <p>Accessing the level 3 table works by moving each block three blocks to the right and using the recursive index for the original level 4, level 3, and level 2 address blocks:</p> <p><img src="https://os.phil-opp.com/paging-implementation/table-indices-from-address-recursive-level-3.svg" alt="Bits 0–12 are the offset into the level 3 table frame, bits 12–21 the level 4 index, and bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry" /></p> <p>Finally, we can access the level 4 table by moving each block four blocks to the right and using the recursive index for all address blocks except for the offset:</p> <p><img src="https://os.phil-opp.com/paging-implementation/table-indices-from-address-recursive-level-4.svg" alt="Bits 0–12 are the offset into the level l table frame and bits 12–21, bits 21–30, bits 30–39, and bits 39–48 are the index of the recursive entry" /></p> <p>We can now calculate virtual addresses for the page tables of all four levels. We can even calculate an address that points exactly to a specific page table entry by multiplying its index by 8, the size of a page table entry.</p> <p>The table below summarizes the address structure for accessing the different kinds of frames:</p> <table><thead><tr><th>Virtual Address for</th><th>Address Structure (<a href="https://en.wikipedia.org/wiki/Octal">octal</a>)</th></tr></thead><tbody> <tr><td>Page</td><td><code>0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE</code></td></tr> <tr><td>Level 1 Table Entry</td><td><code>0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD</code></td></tr> <tr><td>Level 2 Table Entry</td><td><code>0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC</code></td></tr> <tr><td>Level 3 Table Entry</td><td><code>0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB</code></td></tr> <tr><td>Level 4 Table Entry</td><td><code>0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA</code></td></tr> </tbody></table> <p>Whereas <code>AAA</code> is the level 4 index, <code>BBB</code> the level 3 index, <code>CCC</code> the level 2 index, and <code>DDD</code> the level 1 index of the mapped frame, and <code>EEEE</code> the offset into it. <code>RRR</code> is the index of the recursive entry. When an index (three digits) is transformed to an offset (four digits), it is done by multiplying it by 8 (the size of a page table entry). With this offset, the resulting address directly points to the respective page table entry.</p> <p><code>SSSSSS</code> are sign extension bits, which means that they are all copies of bit 47. This is a special requirement for valid addresses on the x86_64 architecture. We explained it in the <a href="https://os.phil-opp.com/paging-introduction/#paging-on-x86-64">previous post</a>.</p> <p>We use <a href="https://en.wikipedia.org/wiki/Octal">octal</a> numbers for representing the addresses since each octal character represents three bits, which allows us to clearly separate the 9-bit indexes of the different page table levels. This isn’t possible with the hexadecimal system, where each character represents four bits.</p> <h5 id="in-rust-code"><a class="zola-anchor" href="#in-rust-code" aria-label="Anchor link for: in-rust-code">🔗</a>In Rust Code</h5> <p>To construct such addresses in Rust code, you can use bitwise operations:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// the virtual address whose corresponding page tables you want to access </span><span style="color:#569cd6;">let</span><span> addr: </span><span style="color:#569cd6;">usize </span><span>= […]; </span><span> </span><span style="color:#569cd6;">let</span><span> r = </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// recursive index </span><span style="color:#569cd6;">let</span><span> sign = </span><span style="color:#b5cea8;">0o177777 </span><span>&lt;&lt; </span><span style="color:#b5cea8;">48</span><span>; </span><span style="color:#608b4e;">// sign extension </span><span> </span><span style="color:#608b4e;">// retrieve the page table indices of the address that we want to translate </span><span style="color:#569cd6;">let</span><span> l4_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 4 index </span><span style="color:#569cd6;">let</span><span> l3_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 3 index </span><span style="color:#569cd6;">let</span><span> l2_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 2 index </span><span style="color:#569cd6;">let</span><span> l1_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">12</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 1 index </span><span style="color:#569cd6;">let</span><span> page_offset = addr </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o7777</span><span>; </span><span> </span><span style="color:#608b4e;">// calculate the table addresses </span><span style="color:#569cd6;">let</span><span> level_4_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span><span style="color:#569cd6;">let</span><span> level_3_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(l4_idx &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span><span style="color:#569cd6;">let</span><span> level_2_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(l4_idx &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(l3_idx &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span><span style="color:#569cd6;">let</span><span> level_1_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(l4_idx &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(l3_idx &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(l2_idx &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span></code></pre> <p>The above code assumes that the last level 4 entry with index <code>0o777</code> (511) is recursively mapped. This isn’t the case currently, so the code won’t work yet. See below on how to tell the bootloader to set up the recursive mapping.</p> <p>Alternatively to performing the bitwise operations by hand, you can use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html"><code>RecursivePageTable</code></a> type of the <code>x86_64</code> crate, which provides safe abstractions for various page table operations. For example, the code below shows how to translate a virtual address to its mapped physical address:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; </span><span style="color:#569cd6;">use </span><span>x86_64::{VirtAddr, PhysAddr}; </span><span> </span><span style="color:#608b4e;">/// Creates a RecursivePageTable instance from the level 4 address. </span><span style="color:#569cd6;">let</span><span> level_4_table_addr = […]; </span><span style="color:#569cd6;">let</span><span> level_4_table_ptr = level_4_table_addr </span><span style="color:#569cd6;">as *mut</span><span> PageTable; </span><span style="color:#569cd6;">let</span><span> recursive_page_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table = </span><span style="color:#569cd6;">&amp;mut </span><span>*level_4_table_ptr; </span><span> RecursivePageTable::new(level_4_table).unwrap(); </span><span>} </span><span> </span><span> </span><span style="color:#608b4e;">/// Retrieve the physical address for the given virtual address </span><span style="color:#569cd6;">let</span><span> addr: </span><span style="color:#569cd6;">u64 </span><span>= […] </span><span style="color:#569cd6;">let</span><span> addr = VirtAddr::new(addr); </span><span style="color:#569cd6;">let</span><span> page: Page = Page::containing_address(addr); </span><span> </span><span style="color:#608b4e;">// perform the translation </span><span style="color:#569cd6;">let</span><span> frame = recursive_page_table.translate_page(page); </span><span>frame.map(|frame| frame.start_address() + </span><span style="color:#569cd6;">u64</span><span>::from(addr.page_offset())) </span></code></pre> <p>Again, a valid recursive mapping is required for this code. With such a mapping, the missing <code>level_4_table_addr</code> can be calculated as in the first code example.</p> </details> <hr /> <p>Recursive Paging is an interesting technique that shows how powerful a single mapping in a page table can be. It is relatively easy to implement and only requires a minimal amount of setup (just a single recursive entry), so it’s a good choice for first experiments with paging.</p> <p>However, it also has some disadvantages:</p> <ul> <li>It occupies a large amount of virtual memory (512 GiB). This isn’t a big problem in the large 48-bit address space, but it might lead to suboptimal cache behavior.</li> <li>It only allows accessing the currently active address space easily. Accessing other address spaces is still possible by changing the recursive entry, but a temporary mapping is required for switching back. We described how to do this in the (outdated) <a href="https://os.phil-opp.com/remap-the-kernel/#overview"><em>Remap The Kernel</em></a> post.</li> <li>It heavily relies on the page table format of x86 and might not work on other architectures.</li> </ul> <h2 id="bootloader-support"><a class="zola-anchor" href="#bootloader-support" aria-label="Anchor link for: bootloader-support">🔗</a>Bootloader Support</h2> <p>All of these approaches require page table modifications for their setup. For example, mappings for the physical memory need to be created or an entry of the level 4 table needs to be mapped recursively. The problem is that we can’t create these required mappings without an existing way to access the page tables.</p> <p>This means that we need the help of the bootloader, which creates the page tables that our kernel runs on. The bootloader has access to the page tables, so it can create any mappings that we need. In its current implementation, the <code>bootloader</code> crate has support for two of the above approaches, controlled through <a href="https://doc.rust-lang.org/cargo/reference/features.html#the-features-section">cargo features</a>:</p> <ul> <li>The <code>map_physical_memory</code> feature maps the complete physical memory somewhere into the virtual address space. Thus, the kernel has access to all physical memory and can follow the <a href="https://os.phil-opp.com/paging-implementation/#map-the-complete-physical-memory"><em>Map the Complete Physical Memory</em></a> approach.</li> <li>With the <code>recursive_page_table</code> feature, the bootloader maps an entry of the level 4 page table recursively. This allows the kernel to access the page tables as described in the <a href="https://os.phil-opp.com/paging-implementation/#recursive-page-tables"><em>Recursive Page Tables</em></a> section.</li> </ul> <p>We choose the first approach for our kernel since it is simple, platform-independent, and more powerful (it also allows access to non-page-table-frames). To enable the required bootloader support, we add the <code>map_physical_memory</code> feature to our <code>bootloader</code> dependency:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">bootloader </span><span>= { </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.9&quot;</span><span>, </span><span style="color:#569cd6;">features </span><span>= [</span><span style="color:#d69d85;">&quot;map_physical_memory&quot;</span><span>]} </span></code></pre> <p>With this feature enabled, the bootloader maps the complete physical memory to some unused virtual address range. To communicate the virtual address range to our kernel, the bootloader passes a <em>boot information</em> structure.</p> <h3 id="boot-information"><a class="zola-anchor" href="#boot-information" aria-label="Anchor link for: boot-information">🔗</a>Boot Information</h3> <p>The <code>bootloader</code> crate defines a <a href="https://docs.rs/bootloader/0.9/bootloader/bootinfo/struct.BootInfo.html"><code>BootInfo</code></a> struct that contains all the information it passes to our kernel. The struct is still in an early stage, so expect some breakage when updating to future <a href="https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements">semver-incompatible</a> bootloader versions. With the <code>map_physical_memory</code> feature enabled, it currently has the two fields <code>memory_map</code> and <code>physical_memory_offset</code>:</p> <ul> <li>The <code>memory_map</code> field contains an overview of the available physical memory. This tells our kernel how much physical memory is available in the system and which memory regions are reserved for devices such as the VGA hardware. The memory map can be queried from the BIOS or UEFI firmware, but only very early in the boot process. For this reason, it must be provided by the bootloader because there is no way for the kernel to retrieve it later. We will need the memory map later in this post.</li> <li>The <code>physical_memory_offset</code> tells us the virtual start address of the physical memory mapping. By adding this offset to a physical address, we get the corresponding virtual address. This allows us to access arbitrary physical memory from our kernel.</li> <li>This physical memory offset can be customized by adding a <code>[package.metadata.bootloader]</code> table in Cargo.toml and setting the field <code>physical-memory-offset = "0x0000f00000000000"</code> (or any other value). However, note that the bootloader can panic if it runs into physical address values that start to overlap with the space beyond the offset, i.e., areas it would have previously mapped to some other early physical addresses. So in general, the higher the value (&gt; 1 TiB), the better.</li> </ul> <p>The bootloader passes the <code>BootInfo</code> struct to our kernel in the form of a <code>&amp;'static BootInfo</code> argument to our <code>_start</code> function. We don’t have this argument declared in our function yet, so let’s add it:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::BootInfo; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span style="color:#608b4e;">// new argument </span><span> […] </span><span>} </span></code></pre> <p>It wasn’t a problem to leave off this argument before because the x86_64 calling convention passes the first argument in a CPU register. Thus, the argument is simply ignored when it isn’t declared. However, it would be a problem if we accidentally used a wrong argument type, since the compiler doesn’t know the correct type signature of our entry point function.</p> <h3 id="the-entry-point-macro"><a class="zola-anchor" href="#the-entry-point-macro" aria-label="Anchor link for: the-entry-point-macro">🔗</a>The <code>entry_point</code> Macro</h3> <p>Since our <code>_start</code> function is called externally from the bootloader, no checking of our function signature occurs. This means that we could let it take arbitrary arguments without any compilation errors, but it would fail or cause undefined behavior at runtime.</p> <p>To make sure that the entry point function always has the correct signature that the bootloader expects, the <code>bootloader</code> crate provides an <a href="https://docs.rs/bootloader/0.6.4/bootloader/macro.entry_point.html"><code>entry_point</code></a> macro that provides a type-checked way to define a Rust function as the entry point. Let’s rewrite our entry point function to use this macro:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::{BootInfo, entry_point}; </span><span> </span><span>entry_point!(kernel_main); </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span>} </span></code></pre> <p>We no longer need to use <code>extern "C"</code> or <code>no_mangle</code> for our entry point, as the macro defines the real lower level <code>_start</code> entry point for us. The <code>kernel_main</code> function is now a completely normal Rust function, so we can choose an arbitrary name for it. The important thing is that it is type-checked so that a compilation error occurs when we use a wrong function signature, for example by adding an argument or changing the argument type.</p> <p>Let’s perform the same change in our <code>lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[cfg(test)] </span><span style="color:#569cd6;">use </span><span>bootloader::{entry_point, BootInfo}; </span><span> </span><span>#[cfg(test)] </span><span>entry_point!(test_kernel_main); </span><span> </span><span style="color:#608b4e;">/// Entry point for `cargo test` </span><span>#[cfg(test)] </span><span style="color:#569cd6;">fn </span><span>test_kernel_main(_boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// like before </span><span> init(); </span><span> test_main(); </span><span> hlt_loop(); </span><span>} </span></code></pre> <p>Since the entry point is only used in test mode, we add the <code>#[cfg(test)]</code> attribute to all items. We give our test entry point the distinct name <code>test_kernel_main</code> to avoid confusion with the <code>kernel_main</code> of our <code>main.rs</code>. We don’t use the <code>BootInfo</code> parameter for now, so we prefix the parameter name with a <code>_</code> to silence the unused variable warning.</p> <h2 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h2> <p>Now that we have access to physical memory, we can finally start to implement our page table code. First, we will take a look at the currently active page tables that our kernel runs on. In the second step, we will create a translation function that returns the physical address that a given virtual address is mapped to. As a last step, we will try to modify the page tables in order to create a new mapping.</p> <p>Before we begin, we create a new <code>memory</code> module for our code:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>memory; </span></code></pre> <p>For the module, we create an empty <code>src/memory.rs</code> file.</p> <h3 id="accessing-the-page-tables"><a class="zola-anchor" href="#accessing-the-page-tables" aria-label="Anchor link for: accessing-the-page-tables">🔗</a>Accessing the Page Tables</h3> <p>At the <a href="https://os.phil-opp.com/paging-introduction/#accessing-the-page-tables">end of the previous post</a>, we tried to take a look at the page tables our kernel runs on, but failed since we couldn’t access the physical frame that the <code>CR3</code> register points to. We’re now able to continue from there by creating an <code>active_level_4_table</code> function that returns a reference to the active level 4 page table:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::{ </span><span> structures::paging::PageTable, </span><span> VirtAddr, </span><span>}; </span><span> </span><span style="color:#608b4e;">/// Returns a mutable reference to the active level 4 table. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// This function is unsafe because the caller must guarantee that the </span><span style="color:#608b4e;">/// complete physical memory is mapped to virtual memory at the passed </span><span style="color:#608b4e;">/// `physical_memory_offset`. Also, this function must be only called once </span><span style="color:#608b4e;">/// to avoid aliasing `&amp;mut` references (which is undefined behavior). </span><span style="color:#569cd6;">pub unsafe fn </span><span>active_level_4_table(physical_memory_offset: VirtAddr) </span><span> -&gt; </span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> PageTable </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control::Cr3; </span><span> </span><span> </span><span style="color:#569cd6;">let </span><span>(level_4_table_frame, </span><span style="color:#569cd6;">_</span><span>) = Cr3::read(); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> phys = level_4_table_frame.start_address(); </span><span> </span><span style="color:#569cd6;">let</span><span> virt = physical_memory_offset + phys.as_u64(); </span><span> </span><span style="color:#569cd6;">let</span><span> page_table_ptr: </span><span style="color:#569cd6;">*mut</span><span> PageTable = virt.as_mut_ptr(); </span><span> </span><span> </span><span style="color:#569cd6;">&amp;mut </span><span>*page_table_ptr </span><span style="color:#608b4e;">// unsafe </span><span>} </span></code></pre> <p>First, we read the physical frame of the active level 4 table from the <code>CR3</code> register. We then take its physical start address, convert it to a <code>u64</code>, and add it to <code>physical_memory_offset</code> to get the virtual address where the page table frame is mapped. Finally, we convert the virtual address to a <code>*mut PageTable</code> raw pointer through the <code>as_mut_ptr</code> method and then unsafely create a <code>&amp;mut PageTable</code> reference from it. We create a <code>&amp;mut</code> reference instead of a <code>&amp;</code> reference because we will mutate the page tables later in this post.</p> <p>We don’t need to use an unsafe block here because Rust treats the complete body of an <code>unsafe fn</code> like a large <code>unsafe</code> block. This makes our code more dangerous since we could accidentally introduce an unsafe operation in previous lines without noticing. It also makes it much more difficult to spot unsafe operations in between safe operations. There is an <a href="https://github.com/rust-lang/rfcs/pull/2585">RFC</a> to change this behavior.</p> <p>We can now use this function to print the entries of the level 4 table:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory::active_level_4_table; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::VirtAddr; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> blog_os::init(); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); </span><span> </span><span style="color:#569cd6;">let</span><span> l4_table = </span><span style="color:#569cd6;">unsafe </span><span>{ active_level_4_table(phys_mem_offset) }; </span><span> </span><span> </span><span style="color:#569cd6;">for </span><span>(i, entry) </span><span style="color:#569cd6;">in</span><span> l4_table.iter().enumerate() { </span><span> </span><span style="color:#569cd6;">if !</span><span>entry.is_unused() { </span><span> println!(</span><span style="color:#d69d85;">&quot;L4 Entry </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, i, entry); </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// as before </span><span> #[cfg(test)] </span><span> test_main(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>First, we convert the <code>physical_memory_offset</code> of the <code>BootInfo</code> struct to a <a href="https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html"><code>VirtAddr</code></a> and pass it to the <code>active_level_4_table</code> function. We then use the <code>iter</code> function to iterate over the page table entries and the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate"><code>enumerate</code></a> combinator to additionally add an index <code>i</code> to each element. We only print non-empty entries because all 512 entries wouldn’t fit on the screen.</p> <p>When we run it, we see the following output:</p> <p><img src="https://os.phil-opp.com/paging-implementation/qemu-print-level-4-table.png" alt="QEMU printing entry 0 (0x2000, PRESENT, WRITABLE, ACCESSED), entry 1 (0x894000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 31 (0x88e000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 175 (0x891000, PRESENT, WRITABLE, ACCESSED, DIRTY), and entry 504 (0x897000, PRESENT, WRITABLE, ACCESSED, DIRTY)" /></p> <p>We see that there are various non-empty entries, which all map to different level 3 tables. There are so many regions because kernel code, kernel stack, physical memory mapping, and boot information all use separate memory areas.</p> <p>To traverse the page tables further and take a look at a level 3 table, we can take the mapped frame of an entry and convert it to a virtual address again:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in the `for` loop in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::PageTable; </span><span> </span><span style="color:#569cd6;">if !</span><span>entry.is_unused() { </span><span> println!(</span><span style="color:#d69d85;">&quot;L4 Entry </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, i, entry); </span><span> </span><span> </span><span style="color:#608b4e;">// get the physical address from the entry and convert it </span><span> </span><span style="color:#569cd6;">let</span><span> phys = entry.frame().unwrap().start_address(); </span><span> </span><span style="color:#569cd6;">let</span><span> virt = phys.as_u64() + boot_info.physical_memory_offset; </span><span> </span><span style="color:#569cd6;">let</span><span> ptr = VirtAddr::new(virt).as_mut_ptr(); </span><span> </span><span style="color:#569cd6;">let</span><span> l3_table: </span><span style="color:#569cd6;">&amp;</span><span>PageTable = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*ptr }; </span><span> </span><span> </span><span style="color:#608b4e;">// print non-empty entries of the level 3 table </span><span> </span><span style="color:#569cd6;">for </span><span>(i, entry) </span><span style="color:#569cd6;">in</span><span> l3_table.iter().enumerate() { </span><span> </span><span style="color:#569cd6;">if !</span><span>entry.is_unused() { </span><span> println!(</span><span style="color:#d69d85;">&quot; L3 Entry </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, i, entry); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>For looking at the level 2 and level 1 tables, we repeat that process for the level 3 and level 2 entries. As you can imagine, this gets very verbose very quickly, so we don’t show the full code here.</p> <p>Traversing the page tables manually is interesting because it helps to understand how the CPU performs the translation. However, most of the time, we are only interested in the mapped physical address for a given virtual address, so let’s create a function for that.</p> <h3 id="translating-addresses"><a class="zola-anchor" href="#translating-addresses" aria-label="Anchor link for: translating-addresses">🔗</a>Translating Addresses</h3> <p>To translate a virtual to a physical address, we have to traverse the four-level page table until we reach the mapped frame. Let’s create a function that performs this translation:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::PhysAddr; </span><span> </span><span style="color:#608b4e;">/// Translates the given virtual address to the mapped physical address, or </span><span style="color:#608b4e;">/// `None` if the address is not mapped. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// This function is unsafe because the caller must guarantee that the </span><span style="color:#608b4e;">/// complete physical memory is mapped to virtual memory at the passed </span><span style="color:#608b4e;">/// `physical_memory_offset`. </span><span style="color:#569cd6;">pub unsafe fn </span><span>translate_addr(addr: VirtAddr, physical_memory_offset: VirtAddr) </span><span> -&gt; Option&lt;PhysAddr&gt; </span><span>{ </span><span> translate_addr_inner(addr, physical_memory_offset) </span><span>} </span></code></pre> <p>We forward the function to a safe <code>translate_addr_inner</code> function to limit the scope of <code>unsafe</code>. As we noted above, Rust treats the complete body of an <code>unsafe fn</code> like a large unsafe block. By calling into a private safe function, we make each <code>unsafe</code> operation explicit again.</p> <p>The private inner function contains the real implementation:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#608b4e;">/// Private function that is called by `translate_addr`. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// This function is safe to limit the scope of `unsafe` because Rust treats </span><span style="color:#608b4e;">/// the whole body of unsafe functions as an unsafe block. This function must </span><span style="color:#608b4e;">/// only be reachable through `unsafe fn` from outside of this module. </span><span style="color:#569cd6;">fn </span><span>translate_addr_inner(addr: VirtAddr, physical_memory_offset: VirtAddr) </span><span> -&gt; Option&lt;PhysAddr&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::page_table::FrameError; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control::Cr3; </span><span> </span><span> </span><span style="color:#608b4e;">// read the active level 4 frame from the CR3 register </span><span> </span><span style="color:#569cd6;">let </span><span>(level_4_table_frame, </span><span style="color:#569cd6;">_</span><span>) = Cr3::read(); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> table_indexes = [ </span><span> addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index() </span><span> ]; </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame = level_4_table_frame; </span><span> </span><span> </span><span style="color:#608b4e;">// traverse the multi-level page table </span><span> </span><span style="color:#569cd6;">for &amp;</span><span>index </span><span style="color:#569cd6;">in &amp;</span><span>table_indexes { </span><span> </span><span style="color:#608b4e;">// convert the frame into a page table reference </span><span> </span><span style="color:#569cd6;">let</span><span> virt = physical_memory_offset + frame.start_address().as_u64(); </span><span> </span><span style="color:#569cd6;">let</span><span> table_ptr: </span><span style="color:#569cd6;">*const</span><span> PageTable = virt.as_ptr(); </span><span> </span><span style="color:#569cd6;">let</span><span> table = </span><span style="color:#569cd6;">unsafe </span><span>{</span><span style="color:#569cd6;">&amp;</span><span>*table_ptr}; </span><span> </span><span> </span><span style="color:#608b4e;">// read the page table entry and update `frame` </span><span> </span><span style="color:#569cd6;">let</span><span> entry = </span><span style="color:#569cd6;">&amp;</span><span>table[index]; </span><span> frame = </span><span style="color:#569cd6;">match</span><span> entry.frame() { </span><span> Ok(frame) </span><span style="color:#569cd6;">=&gt;</span><span> frame, </span><span> Err(FrameError::FrameNotPresent) </span><span style="color:#569cd6;">=&gt; return </span><span>None, </span><span> Err(FrameError::HugeFrame) </span><span style="color:#569cd6;">=&gt; </span><span>panic!(</span><span style="color:#d69d85;">&quot;huge pages not supported&quot;</span><span>), </span><span> }; </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// calculate the physical address by adding the page offset </span><span> Some(frame.start_address() + </span><span style="color:#569cd6;">u64</span><span>::from(addr.page_offset())) </span><span>} </span></code></pre> <p>Instead of reusing our <code>active_level_4_table</code> function, we read the level 4 frame from the <code>CR3</code> register again. We do this because it simplifies this prototype implementation. Don’t worry, we will create a better solution in a moment.</p> <p>The <code>VirtAddr</code> struct already provides methods to compute the indexes into the page tables of the four levels. We store these indexes in a small array because it allows us to traverse the page tables using a <code>for</code> loop. Outside of the loop, we remember the last visited <code>frame</code> to calculate the physical address later. The <code>frame</code> points to page table frames while iterating and to the mapped frame after the last iteration, i.e., after following the level 1 entry.</p> <p>Inside the loop, we again use the <code>physical_memory_offset</code> to convert the frame into a page table reference. We then read the entry of the current page table and use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html#method.frame"><code>PageTableEntry::frame</code></a> function to retrieve the mapped frame. If the entry is not mapped to a frame, we return <code>None</code>. If the entry maps a huge 2 MiB or 1 GiB page, we panic for now.</p> <p>Let’s test our translation function by translating some addresses:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// new import </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory::translate_addr; </span><span> </span><span> […] </span><span style="color:#608b4e;">// hello world and blog_os::init </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> addresses = [ </span><span> </span><span style="color:#608b4e;">// the identity-mapped vga buffer page </span><span> </span><span style="color:#b5cea8;">0xb8000</span><span>, </span><span> </span><span style="color:#608b4e;">// some code page </span><span> </span><span style="color:#b5cea8;">0x201008</span><span>, </span><span> </span><span style="color:#608b4e;">// some stack page </span><span> </span><span style="color:#b5cea8;">0x0100_0020_1a10</span><span>, </span><span> </span><span style="color:#608b4e;">// virtual address mapped to physical address 0 </span><span> boot_info.physical_memory_offset, </span><span> ]; </span><span> </span><span> </span><span style="color:#569cd6;">for &amp;</span><span>address </span><span style="color:#569cd6;">in &amp;</span><span>addresses { </span><span> </span><span style="color:#569cd6;">let</span><span> virt = VirtAddr::new(address); </span><span> </span><span style="color:#569cd6;">let</span><span> phys = </span><span style="color:#569cd6;">unsafe </span><span>{ translate_addr(virt, phys_mem_offset) }; </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;"> -&gt; </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, virt, phys); </span><span> } </span><span> </span><span> […] </span><span style="color:#608b4e;">// test_main(), &quot;it did not crash&quot; printing, and hlt_loop() </span><span>} </span></code></pre> <p>When we run it, we see the following output:</p> <p><img src="https://os.phil-opp.com/paging-implementation/qemu-translate-addr.png" alt="0xb8000 -&gt; 0xb8000, 0x201008 -&gt; 0x401008, 0x10000201a10 -&gt; 0x279a10, “panicked at ‘huge pages not supported’" /></p> <p>As expected, the identity-mapped address <code>0xb8000</code> translates to the same physical address. The code page and the stack page translate to some arbitrary physical addresses, which depend on how the bootloader created the initial mapping for our kernel. It’s worth noting that the last 12 bits always stay the same after translation, which makes sense because these bits are the <a href="https://os.phil-opp.com/paging-introduction/#paging-on-x86-64"><em>page offset</em></a> and not part of the translation.</p> <p>Since each physical address can be accessed by adding the <code>physical_memory_offset</code>, the translation of the <code>physical_memory_offset</code> address itself should point to physical address <code>0</code>. However, the translation fails because the mapping uses huge pages for efficiency, which is not supported in our implementation yet.</p> <h3 id="using-offsetpagetable"><a class="zola-anchor" href="#using-offsetpagetable" aria-label="Anchor link for: using-offsetpagetable">🔗</a>Using <code>OffsetPageTable</code></h3> <p>Translating virtual to physical addresses is a common task in an OS kernel, therefore the <code>x86_64</code> crate provides an abstraction for it. The implementation already supports huge pages and several other page table functions apart from <code>translate_addr</code>, so we will use it in the following instead of adding huge page support to our own implementation.</p> <p>At the basis of the abstraction are two traits that define various page table mapping functions:</p> <ul> <li>The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html"><code>Mapper</code></a> trait is generic over the page size and provides functions that operate on pages. Examples are <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.translate_page"><code>translate_page</code></a>, which translates a given page to a frame of the same size, and <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to"><code>map_to</code></a>, which creates a new mapping in the page table.</li> <li>The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html"><code>Translate</code></a> trait provides functions that work with multiple page sizes, such as <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#method.translate_addr"><code>translate_addr</code></a> or the general <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html"><code>translate</code></a>.</li> </ul> <p>The traits only define the interface, they don’t provide any implementation. The <code>x86_64</code> crate currently provides three types that implement the traits with different requirements. The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html"><code>OffsetPageTable</code></a> type assumes that the complete physical memory is mapped to the virtual address space at some offset. The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MappedPageTable.html"><code>MappedPageTable</code></a> is a bit more flexible: It only requires that each page table frame is mapped to the virtual address space at a calculable address. Finally, the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html"><code>RecursivePageTable</code></a> type can be used to access page table frames through <a href="https://os.phil-opp.com/paging-implementation/#recursive-page-tables">recursive page tables</a>.</p> <p>In our case, the bootloader maps the complete physical memory at a virtual address specified by the <code>physical_memory_offset</code> variable, so we can use the <code>OffsetPageTable</code> type. To initialize it, we create a new <code>init</code> function in our <code>memory</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::OffsetPageTable; </span><span> </span><span style="color:#608b4e;">/// Initialize a new OffsetPageTable. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// This function is unsafe because the caller must guarantee that the </span><span style="color:#608b4e;">/// complete physical memory is mapped to virtual memory at the passed </span><span style="color:#608b4e;">/// `physical_memory_offset`. Also, this function must be only called once </span><span style="color:#608b4e;">/// to avoid aliasing `&amp;mut` references (which is undefined behavior). </span><span style="color:#569cd6;">pub unsafe fn </span><span>init(physical_memory_offset: VirtAddr) -&gt; OffsetPageTable&lt;</span><span style="color:#569cd6;">&#39;static</span><span>&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table = active_level_4_table(physical_memory_offset); </span><span> OffsetPageTable::new(level_4_table, physical_memory_offset) </span><span>} </span><span> </span><span style="color:#608b4e;">// make private </span><span style="color:#569cd6;">unsafe fn </span><span>active_level_4_table(physical_memory_offset: VirtAddr) </span><span> -&gt; </span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> PageTable </span><span>{…} </span></code></pre> <p>The function takes the <code>physical_memory_offset</code> as an argument and returns a new <code>OffsetPageTable</code> instance with a <code>'static</code> lifetime. This means that the instance stays valid for the complete runtime of our kernel. In the function body, we first call the <code>active_level_4_table</code> function to retrieve a mutable reference to the level 4 page table. We then invoke the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html#method.new"><code>OffsetPageTable::new</code></a> function with this reference. As the second parameter, the <code>new</code> function expects the virtual address at which the mapping of the physical memory starts, which is given in the <code>physical_memory_offset</code> variable.</p> <p>The <code>active_level_4_table</code> function should only be called from the <code>init</code> function from now on because it can easily lead to aliased mutable references when called multiple times, which can cause undefined behavior. For this reason, we make the function private by removing the <code>pub</code> specifier.</p> <p>We can now use the <code>Translate::translate_addr</code> method instead of our own <code>memory::translate_addr</code> function. We only need to change a few lines in our <code>kernel_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// new: different imports </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::{structures::paging::Translate, VirtAddr}; </span><span> </span><span> […] </span><span style="color:#608b4e;">// hello world and blog_os::init </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); </span><span> </span><span style="color:#608b4e;">// new: initialize a mapper </span><span> </span><span style="color:#569cd6;">let</span><span> mapper = </span><span style="color:#569cd6;">unsafe </span><span>{ memory::init(phys_mem_offset) }; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> addresses = […]; </span><span style="color:#608b4e;">// same as before </span><span> </span><span> </span><span style="color:#569cd6;">for &amp;</span><span>address </span><span style="color:#569cd6;">in &amp;</span><span>addresses { </span><span> </span><span style="color:#569cd6;">let</span><span> virt = VirtAddr::new(address); </span><span> </span><span style="color:#608b4e;">// new: use the `mapper.translate_addr` method </span><span> </span><span style="color:#569cd6;">let</span><span> phys = mapper.translate_addr(virt); </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;"> -&gt; </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, virt, phys); </span><span> } </span><span> </span><span> […] </span><span style="color:#608b4e;">// test_main(), &quot;it did not crash&quot; printing, and hlt_loop() </span><span>} </span></code></pre> <p>We need to import the <code>Translate</code> trait in order to use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#method.translate_addr"><code>translate_addr</code></a> method it provides.</p> <p>When we run it now, we see the same translation results as before, with the difference that the huge page translation now also works:</p> <p><img src="https://os.phil-opp.com/paging-implementation/qemu-mapper-translate-addr.png" alt="0xb8000 -&gt; 0xb8000, 0x201008 -&gt; 0x401008, 0x10000201a10 -&gt; 0x279a10, 0x18000000000 -&gt; 0x0" /></p> <p>As expected, the translations of <code>0xb8000</code> and the code and stack addresses stay the same as with our own translation function. Additionally, we now see that the virtual address <code>physical_memory_offset</code> is mapped to the physical address <code>0x0</code>.</p> <p>By using the translation function of the <code>MappedPageTable</code> type, we can spare ourselves the work of implementing huge page support. We also have access to other page functions, such as <code>map_to</code>, which we will use in the next section.</p> <p>At this point, we no longer need our <code>memory::translate_addr</code> and <code>memory::translate_addr_inner</code> functions, so we can delete them.</p> <h3 id="creating-a-new-mapping"><a class="zola-anchor" href="#creating-a-new-mapping" aria-label="Anchor link for: creating-a-new-mapping">🔗</a>Creating a new Mapping</h3> <p>Until now, we only looked at the page tables without modifying anything. Let’s change that by creating a new mapping for a previously unmapped page.</p> <p>We will use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to"><code>map_to</code></a> function of the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html"><code>Mapper</code></a> trait for our implementation, so let’s take a look at that function first. The documentation tells us that it takes four arguments: the page that we want to map, the frame that the page should be mapped to, a set of flags for the page table entry, and a <code>frame_allocator</code>. The frame allocator is needed because mapping the given page might require creating additional page tables, which need unused frames as backing storage.</p> <h4 id="a-create-example-mapping-function"><a class="zola-anchor" href="#a-create-example-mapping-function" aria-label="Anchor link for: a-create-example-mapping-function">🔗</a>A <code>create_example_mapping</code> Function</h4> <p>The first step of our implementation is to create a new <code>create_example_mapping</code> function that maps a given virtual page to <code>0xb8000</code>, the physical frame of the VGA text buffer. We choose that frame because it allows us to easily test if the mapping was created correctly: We just need to write to the newly mapped page and see whether we see the write appear on the screen.</p> <p>The <code>create_example_mapping</code> function looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::{ </span><span> PhysAddr, </span><span> structures::paging::{Page, PhysFrame, Mapper, Size4KiB, FrameAllocator} </span><span>}; </span><span> </span><span style="color:#608b4e;">/// Creates an example mapping for the given page to frame `0xb8000`. </span><span style="color:#569cd6;">pub fn </span><span>create_example_mapping( </span><span> page: Page, </span><span> mapper: </span><span style="color:#569cd6;">&amp;mut</span><span> OffsetPageTable, </span><span> frame_allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> impl FrameAllocator&lt;Size4KiB&gt;, </span><span>) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::PageTableFlags </span><span style="color:#569cd6;">as</span><span> Flags; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> frame = PhysFrame::containing_address(PhysAddr::new(</span><span style="color:#b5cea8;">0xb8000</span><span>)); </span><span> </span><span style="color:#569cd6;">let</span><span> flags = Flags::</span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span>Flags::</span><span style="color:#b4cea8;">WRITABLE</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> map_to_result = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#608b4e;">// FIXME: this is not safe, we do it only for testing </span><span> mapper.map_to(page, frame, flags, frame_allocator) </span><span> }; </span><span> map_to_result.expect(</span><span style="color:#d69d85;">&quot;map_to failed&quot;</span><span>).flush(); </span><span>} </span></code></pre> <p>In addition to the <code>page</code> that should be mapped, the function expects a mutable reference to an <code>OffsetPageTable</code> instance and a <code>frame_allocator</code>. The <code>frame_allocator</code> parameter uses the <a href="https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters"><code>impl Trait</code></a> syntax to be <a href="https://doc.rust-lang.org/book/ch10-00-generics.html">generic</a> over all types that implement the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html"><code>FrameAllocator</code></a> trait. The trait is generic over the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/trait.PageSize.html"><code>PageSize</code></a> trait to work with both standard 4 KiB pages and huge 2 MiB/1 GiB pages. We only want to create a 4 KiB mapping, so we set the generic parameter to <code>Size4KiB</code>.</p> <p>The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to"><code>map_to</code></a> method is unsafe because the caller must ensure that the frame is not already in use. The reason for this is that mapping the same frame twice could result in undefined behavior, for example when two different <code>&amp;mut</code> references point to the same physical memory location. In our case, we reuse the VGA text buffer frame, which is already mapped, so we break the required condition. However, the <code>create_example_mapping</code> function is only a temporary testing function and will be removed after this post, so it is ok. To remind us of the unsafety, we put a <code>FIXME</code> comment on the line.</p> <p>In addition to the <code>page</code> and the <code>unused_frame</code>, the <code>map_to</code> method takes a set of flags for the mapping and a reference to the <code>frame_allocator</code>, which will be explained in a moment. For the flags, we set the <code>PRESENT</code> flag because it is required for all valid entries and the <code>WRITABLE</code> flag to make the mapped page writable. For a list of all possible flags, see the <a href="https://os.phil-opp.com/paging-introduction/#page-table-format"><em>Page Table Format</em></a> section of the previous post.</p> <p>The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to"><code>map_to</code></a> function can fail, so it returns a <a href="https://doc.rust-lang.org/core/result/enum.Result.html"><code>Result</code></a>. Since this is just some example code that does not need to be robust, we just use <a href="https://doc.rust-lang.org/core/result/enum.Result.html#method.expect"><code>expect</code></a> to panic when an error occurs. On success, the function returns a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html"><code>MapperFlush</code></a> type that provides an easy way to flush the newly mapped page from the translation lookaside buffer (TLB) with its <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush"><code>flush</code></a> method. Like <code>Result</code>, the type uses the <a href="https://doc.rust-lang.org/std/result/#results-must-be-used"><code>#[must_use]</code></a> attribute to emit a warning when we accidentally forget to use it.</p> <h4 id="a-dummy-frameallocator"><a class="zola-anchor" href="#a-dummy-frameallocator" aria-label="Anchor link for: a-dummy-frameallocator">🔗</a>A dummy <code>FrameAllocator</code></h4> <p>To be able to call <code>create_example_mapping</code>, we need to create a type that implements the <code>FrameAllocator</code> trait first. As noted above, the trait is responsible for allocating frames for new page tables if they are needed by <code>map_to</code>.</p> <p>Let’s start with the simple case and assume that we don’t need to create new page tables. For this case, a frame allocator that always returns <code>None</code> suffices. We create such an <code>EmptyFrameAllocator</code> for testing our mapping function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#608b4e;">/// A FrameAllocator that always returns `None`. </span><span style="color:#569cd6;">pub struct </span><span>EmptyFrameAllocator; </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>FrameAllocator&lt;Size4KiB&gt; </span><span style="color:#569cd6;">for </span><span>EmptyFrameAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;PhysFrame&gt; { </span><span> None </span><span> } </span><span>} </span></code></pre> <p>Implementing the <code>FrameAllocator</code> is unsafe because the implementer must guarantee that the allocator yields only unused frames. Otherwise, undefined behavior might occur, for example when two virtual pages are mapped to the same physical frame. Our <code>EmptyFrameAllocator</code> only returns <code>None</code>, so this isn’t a problem in this case.</p> <h4 id="choosing-a-virtual-page"><a class="zola-anchor" href="#choosing-a-virtual-page" aria-label="Anchor link for: choosing-a-virtual-page">🔗</a>Choosing a Virtual Page</h4> <p>We now have a simple frame allocator that we can pass to our <code>create_example_mapping</code> function. However, the allocator always returns <code>None</code>, so this will only work if no additional page table frames are needed for creating the mapping. To understand when additional page table frames are needed and when not, let’s consider an example:</p> <p><img src="https://os.phil-opp.com/paging-implementation/required-page-frames-example.svg" alt="A virtual and a physical address space with a single mapped page and the page tables of all four levels" /></p> <p>The graphic shows the virtual address space on the left, the physical address space on the right, and the page tables in between. The page tables are stored in physical memory frames, indicated by the dashed lines. The virtual address space contains a single mapped page at address <code>0x803fe00000</code>, marked in blue. To translate this page to its frame, the CPU walks the 4-level page table until it reaches the frame at address 36 KiB.</p> <p>Additionally, the graphic shows the physical frame of the VGA text buffer in red. Our goal is to map a previously unmapped virtual page to this frame using our <code>create_example_mapping</code> function. Since our <code>EmptyFrameAllocator</code> always returns <code>None</code>, we want to create the mapping so that no additional frames are needed from the allocator. This depends on the virtual page that we select for the mapping.</p> <p>The graphic shows two candidate pages in the virtual address space, both marked in yellow. One page is at address <code>0x803fdfd000</code>, which is 3 pages before the mapped page (in blue). While the level 4 and level 3 page table indices are the same as for the blue page, the level 2 and level 1 indices are different (see the <a href="https://os.phil-opp.com/paging-introduction/#paging-on-x86-64">previous post</a>). The different index into the level 2 table means that a different level 1 table is used for this page. Since this level 1 table does not exist yet, we would need to create it if we chose that page for our example mapping, which would require an additional unused physical frame. In contrast, the second candidate page at address <code>0x803fe02000</code> does not have this problem because it uses the same level 1 page table as the blue page. Thus, all the required page tables already exist.</p> <p>In summary, the difficulty of creating a new mapping depends on the virtual page that we want to map. In the easiest case, the level 1 page table for the page already exists and we just need to write a single entry. In the most difficult case, the page is in a memory region for which no level 3 exists yet, so we need to create new level 3, level 2 and level 1 page tables first.</p> <p>For calling our <code>create_example_mapping</code> function with the <code>EmptyFrameAllocator</code>, we need to choose a page for which all page tables already exist. To find such a page, we can utilize the fact that the bootloader loads itself in the first megabyte of the virtual address space. This means that a valid level 1 table exists for all pages in this region. Thus, we can choose any unused page in this memory region for our example mapping, such as the page at address <code>0</code>. Normally, this page should stay unused to guarantee that dereferencing a null pointer causes a page fault, so we know that the bootloader leaves it unmapped.</p> <h4 id="creating-the-mapping"><a class="zola-anchor" href="#creating-the-mapping" aria-label="Anchor link for: creating-the-mapping">🔗</a>Creating the Mapping</h4> <p>We now have all the required parameters for calling our <code>create_example_mapping</code> function, so let’s modify our <code>kernel_main</code> function to map the page at virtual address <code>0</code>. Since we map the page to the frame of the VGA text buffer, we should be able to write to the screen through it afterward. The implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::{structures::paging::Page, VirtAddr}; </span><span style="color:#608b4e;">// new import </span><span> </span><span> […] </span><span style="color:#608b4e;">// hello world and blog_os::init </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); </span><span> </span><span style="color:#569cd6;">let mut</span><span> mapper = </span><span style="color:#569cd6;">unsafe </span><span>{ memory::init(phys_mem_offset) }; </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = memory::EmptyFrameAllocator; </span><span> </span><span> </span><span style="color:#608b4e;">// map an unused page </span><span> </span><span style="color:#569cd6;">let</span><span> page = Page::containing_address(VirtAddr::new(</span><span style="color:#b5cea8;">0</span><span>)); </span><span> memory::create_example_mapping(page, </span><span style="color:#569cd6;">&amp;mut</span><span> mapper, </span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator); </span><span> </span><span> </span><span style="color:#608b4e;">// write the string `New!` to the screen through the new mapping </span><span> </span><span style="color:#569cd6;">let</span><span> page_ptr: </span><span style="color:#569cd6;">*mut u64 </span><span>= page.start_address().as_mut_ptr(); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ page_ptr.offset(</span><span style="color:#b5cea8;">400</span><span>).write_volatile(</span><span style="color:#b5cea8;">0x_f021_f077_f065_f04e</span><span>)}; </span><span> </span><span> […] </span><span style="color:#608b4e;">// test_main(), &quot;it did not crash&quot; printing, and hlt_loop() </span><span>} </span></code></pre> <p>We first create the mapping for the page at address <code>0</code> by calling our <code>create_example_mapping</code> function with a mutable reference to the <code>mapper</code> and the <code>frame_allocator</code> instances. This maps the page to the VGA text buffer frame, so we should see any write to it on the screen.</p> <p>Then we convert the page to a raw pointer and write a value to offset <code>400</code>. We don’t write to the start of the page because the top line of the VGA buffer is directly shifted off the screen by the next <code>println</code>. We write the value <code>0x_f021_f077_f065_f04e</code>, which represents the string <em>“New!”</em> on a white background. As we learned <a href="https://os.phil-opp.com/vga-text-mode/#volatile">in the <em>“VGA Text Mode”</em> post</a>, writes to the VGA buffer should be volatile, so we use the <a href="https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile"><code>write_volatile</code></a> method.</p> <p>When we run it in QEMU, we see the following output:</p> <p><img src="https://os.phil-opp.com/paging-implementation/qemu-new-mapping.png" alt="QEMU printing “It did not crash!” with four completely white cells in the middle of the screen" /></p> <p>The <em>“New!”</em> on the screen is caused by our write to page <code>0</code>, which means that we successfully created a new mapping in the page tables.</p> <p>Creating that mapping only worked because the level 1 table responsible for the page at address <code>0</code> already exists. When we try to map a page for which no level 1 table exists yet, the <code>map_to</code> function fails because it tries to create new page tables by allocating frames with the <code>EmptyFrameAllocator</code>. We can see that happen when we try to map page <code>0xdeadbeaf000</code> instead of <code>0</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span> </span><span style="color:#569cd6;">let</span><span> page = Page::containing_address(VirtAddr::new(</span><span style="color:#b5cea8;">0xdeadbeaf000</span><span>)); </span><span> […] </span><span>} </span></code></pre> <p>When we run it, a panic with the following error message occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>panicked at &#39;map_to failed: FrameAllocationFailed&#39;, /…/result.rs:999:5 </span></code></pre> <p>To map pages that don’t have a level 1 page table yet, we need to create a proper <code>FrameAllocator</code>. But how do we know which frames are unused and how much physical memory is available?</p> <h3 id="allocating-frames"><a class="zola-anchor" href="#allocating-frames" aria-label="Anchor link for: allocating-frames">🔗</a>Allocating Frames</h3> <p>In order to create new page tables, we need to create a proper frame allocator. To do that, we use the <code>memory_map</code> that is passed by the bootloader as part of the <code>BootInfo</code> struct:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::bootinfo::MemoryMap; </span><span> </span><span style="color:#608b4e;">/// A FrameAllocator that returns usable frames from the bootloader&#39;s memory map. </span><span style="color:#569cd6;">pub struct </span><span>BootInfoFrameAllocator { </span><span> memory_map: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> MemoryMap, </span><span> next: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>BootInfoFrameAllocator { </span><span> </span><span style="color:#608b4e;">/// Create a FrameAllocator from the passed memory map. </span><span> </span><span style="color:#608b4e;">/// </span><span> </span><span style="color:#608b4e;">/// This function is unsafe because the caller must guarantee that the passed </span><span> </span><span style="color:#608b4e;">/// memory map is valid. The main requirement is that all frames that are marked </span><span> </span><span style="color:#608b4e;">/// as `USABLE` in it are really unused. </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>init(memory_map: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> MemoryMap) -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> BootInfoFrameAllocator { </span><span> memory_map, </span><span> next: </span><span style="color:#b5cea8;">0</span><span>, </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The struct has two fields: A <code>'static</code> reference to the memory map passed by the bootloader and a <code>next</code> field that keeps track of the number of the next frame that the allocator should return.</p> <p>As we explained in the <a href="https://os.phil-opp.com/paging-implementation/#boot-information"><em>Boot Information</em></a> section, the memory map is provided by the BIOS/UEFI firmware. It can only be queried very early in the boot process, so the bootloader already calls the respective functions for us. The memory map consists of a list of <a href="https://docs.rs/bootloader/0.6.4/bootloader/bootinfo/struct.MemoryRegion.html"><code>MemoryRegion</code></a> structs, which contain the start address, the length, and the type (e.g. unused, reserved, etc.) of each memory region.</p> <p>The <code>init</code> function initializes a <code>BootInfoFrameAllocator</code> with a given memory map. The <code>next</code> field is initialized with <code>0</code> and will be increased for every frame allocation to avoid returning the same frame twice. Since we don’t know if the usable frames of the memory map were already used somewhere else, our <code>init</code> function must be <code>unsafe</code> to require additional guarantees from the caller.</p> <h4 id="a-usable-frames-method"><a class="zola-anchor" href="#a-usable-frames-method" aria-label="Anchor link for: a-usable-frames-method">🔗</a>A <code>usable_frames</code> Method</h4> <p>Before we implement the <code>FrameAllocator</code> trait, we add an auxiliary method that converts the memory map into an iterator of usable frames:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::bootinfo::MemoryRegionType; </span><span> </span><span style="color:#569cd6;">impl </span><span>BootInfoFrameAllocator { </span><span> </span><span style="color:#608b4e;">/// Returns an iterator over the usable frames specified in the memory map. </span><span> </span><span style="color:#569cd6;">fn </span><span>usable_frames(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; impl Iterator&lt;Item = PhysFrame&gt; { </span><span> </span><span style="color:#608b4e;">// get usable regions from memory map </span><span> </span><span style="color:#569cd6;">let</span><span> regions = self.memory_map.iter(); </span><span> </span><span style="color:#569cd6;">let</span><span> usable_regions = regions </span><span> .filter(|r| r.region_type == MemoryRegionType::Usable); </span><span> </span><span style="color:#608b4e;">// map each region to its address range </span><span> </span><span style="color:#569cd6;">let</span><span> addr_ranges = usable_regions </span><span> .map(|r| r.range.start_addr()</span><span style="color:#569cd6;">..</span><span>r.range.end_addr()); </span><span> </span><span style="color:#608b4e;">// transform to an iterator of frame start addresses </span><span> </span><span style="color:#569cd6;">let</span><span> frame_addresses = addr_ranges.flat_map(|r| r.step_by(</span><span style="color:#b5cea8;">4096</span><span>)); </span><span> </span><span style="color:#608b4e;">// create `PhysFrame` types from the start addresses </span><span> frame_addresses.map(|addr| PhysFrame::containing_address(PhysAddr::new(addr))) </span><span> } </span><span>} </span></code></pre> <p>This function uses iterator combinator methods to transform the initial <code>MemoryMap</code> into an iterator of usable physical frames:</p> <ul> <li>First, we call the <code>iter</code> method to convert the memory map to an iterator of <a href="https://docs.rs/bootloader/0.6.4/bootloader/bootinfo/struct.MemoryRegion.html"><code>MemoryRegion</code></a>s.</li> <li>Then we use the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter"><code>filter</code></a> method to skip any reserved or otherwise unavailable regions. The bootloader updates the memory map for all the mappings it creates, so frames that are used by our kernel (code, data, or stack) or to store the boot information are already marked as <code>InUse</code> or similar. Thus, we can be sure that <code>Usable</code> frames are not used somewhere else.</li> <li>Afterwards, we use the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.map"><code>map</code></a> combinator and Rust’s <a href="https://doc.rust-lang.org/core/ops/struct.Range.html">range syntax</a> to transform our iterator of memory regions to an iterator of address ranges.</li> <li>Next, we use <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map"><code>flat_map</code></a> to transform the address ranges into an iterator of frame start addresses, choosing every 4096th address using <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.step_by"><code>step_by</code></a>. Since 4096 bytes (= 4 KiB) is the page size, we get the start address of each frame. The bootloader page-aligns all usable memory areas so that we don’t need any alignment or rounding code here. By using <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map"><code>flat_map</code></a> instead of <code>map</code>, we get an <code>Iterator&lt;Item = u64&gt;</code> instead of an <code>Iterator&lt;Item = Iterator&lt;Item = u64&gt;&gt;</code>.</li> <li>Finally, we convert the start addresses to <code>PhysFrame</code> types to construct an <code>Iterator&lt;Item = PhysFrame&gt;</code>.</li> </ul> <p>The return type of the function uses the <a href="https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits"><code>impl Trait</code></a> feature. This way, we can specify that we return some type that implements the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html"><code>Iterator</code></a> trait with item type <code>PhysFrame</code> but don’t need to name the concrete return type. This is important here because we <em>can’t</em> name the concrete type since it depends on unnamable closure types.</p> <h4 id="implementing-the-frameallocator-trait"><a class="zola-anchor" href="#implementing-the-frameallocator-trait" aria-label="Anchor link for: implementing-the-frameallocator-trait">🔗</a>Implementing the <code>FrameAllocator</code> Trait</h4> <p>Now we can implement the <code>FrameAllocator</code> trait:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>FrameAllocator&lt;Size4KiB&gt; </span><span style="color:#569cd6;">for </span><span>BootInfoFrameAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;PhysFrame&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> frame = self.usable_frames().nth(self.next); </span><span> self.next += </span><span style="color:#b5cea8;">1</span><span>; </span><span> frame </span><span> } </span><span>} </span></code></pre> <p>We first use the <code>usable_frames</code> method to get an iterator of usable frames from the memory map. Then, we use the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.nth"><code>Iterator::nth</code></a> function to get the frame with index <code>self.next</code> (thereby skipping <code>(self.next - 1)</code> frames). Before returning that frame, we increase <code>self.next</code> by one so that we return the following frame on the next call.</p> <p>This implementation is not quite optimal since it recreates the <code>usable_frame</code> allocator on every allocation. It would be better to directly store the iterator as a struct field instead. Then we wouldn’t need the <code>nth</code> method and could just call <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#tymethod.next"><code>next</code></a> on every allocation. The problem with this approach is that it’s not possible to store an <code>impl Trait</code> type in a struct field currently. It might work someday when <a href="https://github.com/rust-lang/rfcs/pull/2071"><em>named existential types</em></a> are fully implemented.</p> <h4 id="using-the-bootinfoframeallocator"><a class="zola-anchor" href="#using-the-bootinfoframeallocator" aria-label="Anchor link for: using-the-bootinfoframeallocator">🔗</a>Using the <code>BootInfoFrameAllocator</code></h4> <p>We can now modify our <code>kernel_main</code> function to pass a <code>BootInfoFrameAllocator</code> instance instead of an <code>EmptyFrameAllocator</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory::BootInfoFrameAllocator; </span><span> […] </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> BootInfoFrameAllocator::init(</span><span style="color:#569cd6;">&amp;</span><span>boot_info.memory_map) </span><span> }; </span><span> […] </span><span>} </span></code></pre> <p>With the boot info frame allocator, the mapping succeeds and we see the black-on-white <em>“New!”</em> on the screen again. Behind the scenes, the <code>map_to</code> method creates the missing page tables in the following way:</p> <ul> <li>Use the passed <code>frame_allocator</code> to allocate an unused frame.</li> <li>Zero the frame to create a new, empty page table.</li> <li>Map the entry of the higher level table to that frame.</li> <li>Continue with the next table level.</li> </ul> <p>While our <code>create_example_mapping</code> function is just some example code, we are now able to create new mappings for arbitrary pages. This will be essential for allocating memory or implementing multithreading in future posts.</p> <p>At this point, we should delete the <code>create_example_mapping</code> function again to avoid accidentally invoking undefined behavior, as explained <a href="https://os.phil-opp.com/paging-implementation/#a-create-example-mapping-function">above</a>.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>In this post we learned about different techniques to access the physical frames of page tables, including identity mapping, mapping of the complete physical memory, temporary mapping, and recursive page tables. We chose to map the complete physical memory since it’s simple, portable, and powerful.</p> <p>We can’t map the physical memory from our kernel without page table access, so we need support from the bootloader. The <code>bootloader</code> crate supports creating the required mapping through optional cargo crate features. It passes the required information to our kernel in the form of a <code>&amp;BootInfo</code> argument to our entry point function.</p> <p>For our implementation, we first manually traversed the page tables to implement a translation function, and then used the <code>MappedPageTable</code> type of the <code>x86_64</code> crate. We also learned how to create new mappings in the page table and how to create the necessary <code>FrameAllocator</code> on top of the memory map passed by the bootloader.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>The next post will create a heap memory region for our kernel, which will allow us to <a href="https://doc.rust-lang.org/alloc/boxed/struct.Box.html">allocate memory</a> and use various <a href="https://doc.rust-lang.org/alloc/collections/index.html">collection types</a>.</p> Advanced Paging Mon, 28 Jan 2019 00:00:00 +0000 https://os.phil-opp.com/advanced-paging/ https://os.phil-opp.com/advanced-paging/ <p>This post explains techniques to make the physical page table frames accessible to our kernel. It then uses such a technique to implement a function that translates virtual to physical addresses. It also explains how to create new mappings in the page tables.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/advanced-paging/#comments">at the bottom</a>. The complete source code for this post can be found <a href="https://github.com/phil-opp/blog_os/tree/5c0fb63f33380fc8596d7166c2ebde03ef3d6726">here</a>.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <h2 id="introduction"><a class="zola-anchor" href="#introduction" aria-label="Anchor link for: introduction">🔗</a>Introduction</h2> <p>In the <a href="https://os.phil-opp.com/paging-introduction/">previous post</a> we learned about the principles of paging and how the 4-level page tables on x86_64 work. We also found out that the bootloader already set up a page table hierarchy for our kernel, which means that our kernel already runs on virtual addresses. This improves safety since illegal memory accesses cause page fault exceptions instead of modifying arbitrary physical memory.</p> <p>However, it also causes a problem when we try to access the page tables from our kernel because we can’t directly access the physical addresses that are stored in page table entries or the <code>CR3</code> register. We experienced that problem already <a href="https://os.phil-opp.com/paging-introduction/#accessing-the-page-tables">at the end of the previous post</a> when we tried to inspect the active page tables.</p> <p>The next section discusses the problem in detail and provides different approaches to a solution. Afterward, we implement a function that traverses the page table hierarchy in order to translate virtual to physical addresses. Finally, we learn how to create new mappings in the page tables and how to find unused memory frames for creating new page tables.</p> <h3 id="dependency-versions"><a class="zola-anchor" href="#dependency-versions" aria-label="Anchor link for: dependency-versions">🔗</a>Dependency Versions</h3> <p>This post requires version 0.3.12 of the <code>bootloader</code> dependency and version 0.5.0 of the <code>x86_64</code> dependency. You can set the dependency versions in your <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">bootloader </span><span>= </span><span style="color:#d69d85;">&quot;0.3.12&quot; </span><span style="color:#569cd6;">x86_64 </span><span>= </span><span style="color:#d69d85;">&quot;0.5.0&quot; </span></code></pre> <h2 id="accessing-page-tables"><a class="zola-anchor" href="#accessing-page-tables" aria-label="Anchor link for: accessing-page-tables">🔗</a>Accessing Page Tables</h2> <p>Accessing the page tables from our kernel is not as easy as it may seem. To understand the problem let’s take a look at the example 4-level page table hierarchy of the previous post again:</p> <p><img src="../paging-introduction/x86_64-page-table-translation.svg" alt="An example 4-level page hierarchy with each page table shown in physical memory" /></p> <p>The important thing here is that each page entry stores the <em>physical</em> address of the next table. This avoids the need to run a translation for these addresses too, which would be bad for performance and could easily cause endless translation loops.</p> <p>The problem for us is that we can’t directly access physical addresses from our kernel since our kernel also runs on top of virtual addresses. For example when we access address <code>4 KiB</code>, we access the <em>virtual</em> address <code>4 KiB</code>, not the <em>physical</em> address <code>4 KiB</code> where the level 4 page table is stored. When we want to access the physical address <code>4 KiB</code>, we can only do so through some virtual address that maps to it.</p> <p>So in order to access page table frames, we need to map some virtual pages to them. There are different ways to create these mappings that all allow us to access arbitrary page table frames:</p> <ul> <li> <p>A simple solution is to <strong>identity map all page tables</strong>:</p> <p><img src="https://os.phil-opp.com/advanced-paging/identity-mapped-page-tables.svg" alt="A virtual and a physical address space with various virtual pages mapped to the physical frame with the same address" /></p> <p>In this example, we see various identity-mapped page table frames. This way the physical addresses of page tables are also valid virtual addresses so that we can easily access the page tables of all levels starting from the CR3 register.</p> <p>However, it clutters the virtual address space and makes it more difficult to find continuous memory regions of larger sizes. For example, imagine that we want to create a virtual memory region of size 1000 KiB in the above graphic, e.g. for <a href="https://en.wikipedia.org/wiki/Memory-mapped_file">memory-mapping a file</a>. We can’t start the region at <code>28 KiB</code> because it would collide with the already mapped page at <code>1004 MiB</code>. So we have to look further until we find a large enough unmapped area, for example at <code>1008 KiB</code>. This is a similar fragmentation problem as with <a href="https://os.phil-opp.com/paging-introduction/#fragmentation">segmentation</a>.</p> <p>Equally, it makes it much more difficult to create new page tables, because we need to find physical frames whose corresponding pages aren’t already in use. For example, let’s assume that we reserved the <em>virtual</em> 1000 KiB memory region starting at <code>1008 KiB</code> for our memory-mapped file. Now we can’t use any frame with a <em>physical</em> address between <code>1000 KiB</code> and <code>2008 KiB</code> anymore, because we can’t identity map it.</p> </li> <li> <p>Alternatively, we could <strong>map the page tables frames only temporarily</strong> when we need to access them. To be able to create the temporary mappings we only need a single identity-mapped level 1 table:</p> <p><img src="https://os.phil-opp.com/advanced-paging/temporarily-mapped-page-tables.png" alt="A virtual and a physical address space with an identity mapped level 1 table, which maps its 0th entry to the level 2 table frame, thereby mapping that frame to page with address 0" /></p> <p>The level 1 table in this graphic controls the first 2 MiB of the virtual address space. This is because it is reachable by starting at the CR3 register and following the 0th entry in the level 4, level 3, and level 2 page tables. The entry with index <code>8</code> maps the virtual page at address <code>32 KiB</code> to the physical frame at address <code>32 KiB</code>, thereby identity mapping the level 1 table itself. The graphic shows this identity-mapping by the horizontal arrow at <code>32 KiB</code>.</p> <p>By writing to the identity-mapped level 1 table, our kernel can create up to 511 temporary mappings (512 minus the entry required for the identity mapping). In the above example, the kernel mapped the 0th entry of the level 1 table to the frame with address <code>24 KiB</code>. This created a temporary mapping of the virtual page at <code>0 KiB</code> to the physical frame of the level 2 page table, indicated by the dashed arrow. Now the kernel can access the level 2 page table by writing to the page starting at <code>0 KiB</code>.</p> <p>The process for accessing an arbitrary page table frame with temporary mappings would be:</p> <ul> <li>Search for a free entry in the identity-mapped level 1 table.</li> <li>Map that entry to the physical frame of the page table that we want to access.</li> <li>Access the target frame through the virtual page that maps to the entry.</li> <li>Set the entry back to unused thereby removing the temporary mapping again.</li> </ul> <p>This approach keeps the virtual address space clean since it reuses the same 512 virtual pages for creating the mappings. The drawback is that it is a bit cumbersome, especially since a new mapping might require modifications of multiple table levels, which means that we would need to repeat the above process multiple times.</p> </li> <li> <p>While both of the above approaches work, there is a third technique called <strong>recursive page tables</strong> that combines their advantages: It keeps all page table frames mapped at all times so that no temporary mappings are needed, and also keeps the mapped pages together to avoid fragmentation of the virtual address space. This is the technique that we will use for our implementation, therefore it is described in detail in the following section.</p> </li> </ul> <h3 id="recursive-page-tables"><a class="zola-anchor" href="#recursive-page-tables" aria-label="Anchor link for: recursive-page-tables">🔗</a>Recursive Page Tables</h3> <p>The idea behind this approach is to map some entry of the level 4 page table to the level 4 table itself. By doing this, we effectively reserve a part of the virtual address space and map all current and future page table frames to that space.</p> <p>Let’s go through an example to understand how this all works:</p> <p><img src="https://os.phil-opp.com/advanced-paging/recursive-page-table.png" alt="An example 4-level page hierarchy with each page table shown in physical memory. Entry 511 of the level 4 page is mapped to frame 4KiB, the frame of the level 4 table itself." /></p> <p>The only difference to the <a href="https://os.phil-opp.com/advanced-paging/#accessing-page-tables">example at the beginning of this post</a> is the additional entry at index <code>511</code> in the level 4 table, which is mapped to physical frame <code>4 KiB</code>, the frame of the level 4 table itself.</p> <p>By letting the CPU follow this entry on a translation, it doesn’t reach a level 3 table, but the same level 4 table again. This is similar to a recursive function that calls itself, therefore this table is called a <em>recursive page table</em>. The important thing is that the CPU assumes that every entry in the level 4 table points to a level 3 table, so it now treats the level 4 table as a level 3 table. This works because tables of all levels have the exact same layout on x86_64.</p> <p>By following the recursive entry one or multiple times before we start the actual translation, we can effectively shorten the number of levels that the CPU traverses. For example, if we follow the recursive entry once and then proceed to the level 3 table, the CPU thinks that the level 3 table is a level 2 table. Going further, it treats the level 2 table as a level 1 table and the level 1 table as the mapped frame. This means that we can now read and write the level 1 page table because the CPU thinks that it is the mapped frame. The graphic below illustrates the 5 translation steps:</p> <p><img src="https://os.phil-opp.com/advanced-paging/recursive-page-table-access-level-1.png" alt="The above example 4-level page hierarchy with 5 arrows: “Step 0” from CR4 to level 4 table, “Step 1” from level 4 table to level 4 table, “Step 2” from level 4 table to level 3 table, “Step 3” from level 3 table to level 2 table, and “Step 4” from level 2 table to level 1 table." /></p> <p>Similarly, we can follow the recursive entry twice before starting the translation to reduce the number of traversed levels to two:</p> <p><img src="https://os.phil-opp.com/advanced-paging/recursive-page-table-access-level-2.png" alt="The same 4-level page hierarchy with the following 4 arrows: “Step 0” from CR4 to level 4 table, “Steps 1&amp;2” from level 4 table to level 4 table, “Step 3” from level 4 table to level 3 table, and “Step 4” from level 3 table to level 2 table." /></p> <p>Let’s go through it step by step: First, the CPU follows the recursive entry on the level 4 table and thinks that it reaches a level 3 table. Then it follows the recursive entry again and thinks that it reaches a level 2 table. But in reality, it is still on the level 4 table. When the CPU now follows a different entry, it lands on a level 3 table but thinks it is already on a level 1 table. So while the next entry points at a level 2 table, the CPU thinks that it points to the mapped frame, which allows us to read and write the level 2 table.</p> <p>Accessing the tables of levels 3 and 4 works in the same way. For accessing the level 3 table, we follow the recursive entry three times, tricking the CPU into thinking it is already on a level 1 table. Then we follow another entry and reach a level 3 table, which the CPU treats as a mapped frame. For accessing the level 4 table itself, we just follow the recursive entry four times until the CPU treats the level 4 table itself as mapped frame (in blue in the graphic below).</p> <p><img src="https://os.phil-opp.com/advanced-paging/recursive-page-table-access-level-3.png" alt="The same 4-level page hierarchy with the following 3 arrows: “Step 0” from CR4 to level 4 table, “Steps 1,2,3” from level 4 table to level 4 table, and “Step 4” from level 4 table to level 3 table. In blue the alternative “Steps 1,2,3,4” arrow from level 4 table to level 4 table." /></p> <p>It might take some time to wrap your head around the concept, but it works quite well in practice.</p> <h4 id="address-calculation"><a class="zola-anchor" href="#address-calculation" aria-label="Anchor link for: address-calculation">🔗</a>Address Calculation</h4> <p>We saw that we can access tables of all levels by following the recursive entry once or multiple times before the actual translation. Since the indexes into the tables of the four levels are derived directly from the virtual address, we need to construct special virtual addresses for this technique. Remember, the page table indexes are derived from the address in the following way:</p> <p><img src="../paging-introduction/x86_64-table-indices-from-address.svg" alt="Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index" /></p> <p>Let’s assume that we want to access the level 1 page table that maps a specific page. As we learned above, this means that we have to follow the recursive entry one time before continuing with the level 4, level 3, and level 2 indexes. To do that we move each block of the address one block to the right and set the original level 4 index to the index of the recursive entry:</p> <p><img src="https://os.phil-opp.com/advanced-paging/table-indices-from-address-recursive-level-1.svg" alt="Bits 0–12 are the offset into the level 1 table frame, bits 12–21 the level 2 index, bits 21–30 the level 3 index, bits 30–39 the level 4 index, and bits 39–48 the index of the recursive entry" /></p> <p>For accessing the level 2 table of that page, we move each index block two blocks to the right and set both the blocks of the original level 4 index and the original level 3 index to the index of the recursive entry:</p> <p><img src="https://os.phil-opp.com/advanced-paging/table-indices-from-address-recursive-level-2.svg" alt="Bits 0–12 are the offset into the level 2 table frame, bits 12–21 the level 3 index, bits 21–30 the level 4 index, and bits 30–39 and bits 39–48 are the index of the recursive entry" /></p> <p>Accessing the level 3 table works by moving each block three blocks to the right and using the recursive index for the original level 4, level 3, and level 2 address blocks:</p> <p><img src="https://os.phil-opp.com/advanced-paging/table-indices-from-address-recursive-level-3.svg" alt="Bits 0–12 are the offset into the level 3 table frame, bits 12–21 the level 4 index, and bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry" /></p> <p>Finally, we can access the level 4 table by moving each block four blocks to the right and using the recursive index for all address blocks except for the offset:</p> <p><img src="https://os.phil-opp.com/advanced-paging/table-indices-from-address-recursive-level-4.svg" alt="Bits 0–12 are the offset into the level l table frame and bits 12–21, bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry" /></p> <p>We can now calculate virtual addresses for the page tables of all four levels. We can even calculate an address that points exactly to a specific page table entry by multiplying its index by 8, the size of a page table entry.</p> <p>The table below summarizes the address structure for accessing the different kinds of frames:</p> <table><thead><tr><th>Virtual Address for</th><th>Address Structure (<a href="https://en.wikipedia.org/wiki/Octal">octal</a>)</th></tr></thead><tbody> <tr><td>Page</td><td><code>0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE</code></td></tr> <tr><td>Level 1 Table Entry</td><td><code>0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD</code></td></tr> <tr><td>Level 2 Table Entry</td><td><code>0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC</code></td></tr> <tr><td>Level 3 Table Entry</td><td><code>0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB</code></td></tr> <tr><td>Level 4 Table Entry</td><td><code>0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA</code></td></tr> </tbody></table> <p>Whereas <code>AAA</code> is the level 4 index, <code>BBB</code> the level 3 index, <code>CCC</code> the level 2 index, and <code>DDD</code> the level 1 index of the mapped frame, and <code>EEEE</code> the offset into it. <code>RRR</code> is the index of the recursive entry. When an index (three digits) is transformed to an offset (four digits), it is done by multiplying it by 8 (the size of a page table entry). With this offset, the resulting address directly points to the respective page table entry.</p> <p><code>SSSSSS</code> are sign extension bits, which means that they are all copies of bit 47. This is a special requirement for valid addresses on the x86_64 architecture. We explained it in the <a href="https://os.phil-opp.com/paging-introduction/#paging-on-x86-64">previous post</a>.</p> <p>We use <a href="https://en.wikipedia.org/wiki/Octal">octal</a> numbers for representing the addresses since each octal character represents three bits, which allows us to clearly separate the 9-bit indexes of the different page table levels. This isn’t possible with the hexadecimal system where each character represents four bits.</p> <h2 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h2> <p>After all this theory we can finally start our implementation. Conveniently, the bootloader not only created page tables for our kernel, but it also created a recursive mapping in the last entry of the level 4 table. The bootloader did this because otherwise there would be a <a href="https://en.wikipedia.org/wiki/Chicken_or_the_egg">chicken or egg problem</a>: We need to access the level 4 table to create a recursive mapping, but we can’t access it without some kind of mapping.</p> <p>We already used this recursive mapping <a href="https://os.phil-opp.com/paging-introduction/#accessing-the-page-tables">at the end of the previous post</a> to access the level 4 table. We did this through the hardcoded address <code>0xffff_ffff_ffff_f000</code>. When we convert this address to <a href="https://en.wikipedia.org/wiki/Octal">octal</a> and compare it with the above table, we can see that it exactly follows the structure of a level 4 table entry with <code>RRR</code> = <code>0o777</code>, <code>AAAA</code> = 0, and the sign extension bits set to <code>1</code> each:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>structure: 0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA </span><span>address: 0o_177777_777_777_777_777_0000 </span></code></pre> <p>With our knowledge about recursive page tables we can now create virtual addresses to access all active page tables. This allows us to create a translation function in software.</p> <h3 id="translating-addresses"><a class="zola-anchor" href="#translating-addresses" aria-label="Anchor link for: translating-addresses">🔗</a>Translating Addresses</h3> <p>As a first step, let’s create a function that translates a virtual address to a physical address by walking the page table hierarchy:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>memory; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::PhysAddr; </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::PageTable; </span><span> </span><span style="color:#608b4e;">/// Returns the physical address for the given virtual address, or `None` if the </span><span style="color:#608b4e;">/// virtual address is not mapped. </span><span style="color:#569cd6;">pub fn </span><span>translate_addr(addr: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;PhysAddr&gt; { </span><span> </span><span style="color:#608b4e;">// introduce variables for the recursive index and the sign extension bits </span><span> </span><span style="color:#608b4e;">// TODO: Don&#39;t hardcode these values </span><span> </span><span style="color:#569cd6;">let</span><span> r = </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// recursive index </span><span> </span><span style="color:#569cd6;">let</span><span> sign = </span><span style="color:#b5cea8;">0o177777 </span><span>&lt;&lt; </span><span style="color:#b5cea8;">48</span><span>; </span><span style="color:#608b4e;">// sign extension </span><span> </span><span> </span><span style="color:#608b4e;">// retrieve the page table indices of the address that we want to translate </span><span> </span><span style="color:#569cd6;">let</span><span> l4_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 4 index </span><span> </span><span style="color:#569cd6;">let</span><span> l3_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 3 index </span><span> </span><span style="color:#569cd6;">let</span><span> l2_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 2 index </span><span> </span><span style="color:#569cd6;">let</span><span> l1_idx = (addr &gt;&gt; </span><span style="color:#b5cea8;">12</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777</span><span>; </span><span style="color:#608b4e;">// level 1 index </span><span> </span><span style="color:#569cd6;">let</span><span> page_offset = addr </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o7777</span><span>; </span><span> </span><span> </span><span style="color:#608b4e;">// calculate the table addresses </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> level_3_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(l4_idx &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> level_2_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(l4_idx &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(l3_idx &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> level_1_table_addr = </span><span> sign </span><span style="color:#569cd6;">| </span><span>(r &lt;&lt; </span><span style="color:#b5cea8;">39</span><span>) </span><span style="color:#569cd6;">| </span><span>(l4_idx &lt;&lt; </span><span style="color:#b5cea8;">30</span><span>) </span><span style="color:#569cd6;">| </span><span>(l3_idx &lt;&lt; </span><span style="color:#b5cea8;">21</span><span>) </span><span style="color:#569cd6;">| </span><span>(l2_idx &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>); </span><span> </span><span> </span><span style="color:#608b4e;">// check that level 4 entry is mapped </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*(level_4_table_addr </span><span style="color:#569cd6;">as *const</span><span> PageTable) }; </span><span> </span><span style="color:#569cd6;">if</span><span> level_4_table[l4_idx].addr().is_null() { </span><span> </span><span style="color:#569cd6;">return </span><span>None; </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// check that level 3 entry is mapped </span><span> </span><span style="color:#569cd6;">let</span><span> level_3_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*(level_3_table_addr </span><span style="color:#569cd6;">as *const</span><span> PageTable) }; </span><span> </span><span style="color:#569cd6;">if</span><span> level_3_table[l3_idx].addr().is_null() { </span><span> </span><span style="color:#569cd6;">return </span><span>None; </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// check that level 2 entry is mapped </span><span> </span><span style="color:#569cd6;">let</span><span> level_2_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*(level_2_table_addr </span><span style="color:#569cd6;">as *const</span><span> PageTable) }; </span><span> </span><span style="color:#569cd6;">if</span><span> level_2_table[l2_idx].addr().is_null() { </span><span> </span><span style="color:#569cd6;">return </span><span>None; </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// check that level 1 entry is mapped and retrieve physical address from it </span><span> </span><span style="color:#569cd6;">let</span><span> level_1_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*(level_1_table_addr </span><span style="color:#569cd6;">as *const</span><span> PageTable) }; </span><span> </span><span style="color:#569cd6;">let</span><span> phys_addr = level_1_table[l1_idx].addr(); </span><span> </span><span style="color:#569cd6;">if</span><span> phys_addr.is_null() { </span><span> </span><span style="color:#569cd6;">return </span><span>None; </span><span> } </span><span> </span><span> Some(phys_addr + page_offset) </span><span>} </span></code></pre> <p>First, we introduce variables for the recursive index (511 = <code>0o777</code>) and the sign extension bits (which are 1 each). Then we calculate the page table indices and the page offset from the address through bitwise operations as specified in the graphic:</p> <p><img src="../paging-introduction/x86_64-table-indices-from-address.svg" alt="Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index" /></p> <p>In the next step we calculate the virtual addresses of the four page tables as descripbed in the <a href="https://os.phil-opp.com/advanced-paging/#address-calculation">address calculation</a> section. We transform each of these addresses to <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/page_table/struct.PageTable.html"><code>PageTable</code></a> references later in the function. These transformations are <code>unsafe</code> operations since the compiler can’t know that these addresses are valid.</p> <p>After the address calculation, we use the indexing operator to look at the entry in the level 4 table. If that entry is null, there is no level 3 table for this level 4 entry, which means that the <code>addr</code> is not mapped to any physical memory, so we return <code>None</code>. If the entry is not <code>None</code>, we know that a level 3 table exists. We then do the same cast and entry-checking as with the level 4 table.</p> <p>After we checked the three higher level pages, we can finally read the entry of the level 1 table that tells us the physical frame that the address is mapped to. As the last step, we add the page offset to that address and return it.</p> <p>If we knew that the address is mapped, we could directly access the level 1 table without looking at the higher level pages first. But since we don’t know this, we have to check whether the level 1 table exists first, otherwise our function would cause a page fault for unmapped addresses.</p> <h4 id="try-it-out"><a class="zola-anchor" href="#try-it-out" aria-label="Anchor link for: try-it-out">🔗</a>Try it out</h4> <p>We can use our new translation function to translate some virtual addresses in our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span style="color:#608b4e;">// initialize GDT, IDT, PICS </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory::translate_addr; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> addresses = [ </span><span> </span><span style="color:#608b4e;">// the identity-mapped vga buffer page </span><span> </span><span style="color:#b5cea8;">0xb8000</span><span>, </span><span> </span><span style="color:#608b4e;">// some code page </span><span> </span><span style="color:#b5cea8;">0x20010a</span><span>, </span><span> </span><span style="color:#608b4e;">// some stack page </span><span> </span><span style="color:#b5cea8;">0x57ac_001f_fe48</span><span>, </span><span> ]; </span><span> </span><span> </span><span style="color:#569cd6;">for &amp;</span><span>address </span><span style="color:#569cd6;">in &amp;</span><span>addresses { </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;"> -&gt; </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, address, translate_addr(address)); </span><span> } </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>When we run it, we see the following output:</p> <p><img src="https://os.phil-opp.com/advanced-paging/qemu-translate-addr.png" alt="0xb8000 -&gt; 0xb8000, 0x20010a -&gt; 0x40010a, 0x57ac001ffe48 -&gt; 0x27be48" /></p> <p>As expected, the identity-mapped address <code>0xb8000</code> translates to the same physical address. The code page and the stack page translate to some arbitrary physical addresses, which depend on how the bootloader created the initial mapping for our kernel.</p> <h4 id="the-recursivepagetable-type"><a class="zola-anchor" href="#the-recursivepagetable-type" aria-label="Anchor link for: the-recursivepagetable-type">🔗</a>The <code>RecursivePageTable</code> Type</h4> <p>The <code>x86_64</code> provides a <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/struct.RecursivePageTable.html"><code>RecursivePageTable</code></a> type that implements safe abstractions for various page table operations. The type implements the <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/trait.MapperAllSizes.html"><code>MapperAllSizes</code></a> trait, which already contains a <code>translate_addr</code> function that we can use instead of hand-rolling our own. To create a new <code>RecursivePageTable</code>, we create a <code>memory::init</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; </span><span style="color:#569cd6;">use </span><span>x86_64::{VirtAddr, PhysAddr}; </span><span> </span><span style="color:#608b4e;">/// Creates a RecursivePageTable instance from the level 4 address. </span><span style="color:#608b4e;">/// </span><span style="color:#608b4e;">/// This function is unsafe because it can break memory safety if an invalid </span><span style="color:#608b4e;">/// address is passed. </span><span style="color:#569cd6;">pub unsafe fn </span><span>init(level_4_table_addr: </span><span style="color:#569cd6;">usize</span><span>) -&gt; RecursivePageTable&lt;</span><span style="color:#569cd6;">&#39;static</span><span>&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table_ptr = level_4_table_addr </span><span style="color:#569cd6;">as *mut</span><span> PageTable; </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table = </span><span style="color:#569cd6;">&amp;mut </span><span>*level_4_table_ptr; </span><span> RecursivePageTable::new(level_4_table).unwrap() </span><span>} </span></code></pre> <p>The <code>RecursivePageTable</code> type encapsulates the unsafety of the page table walk completely so that we no longer need <code>unsafe</code> to implement our own <code>translate_addr</code> function. The <code>init</code> function needs to be unsafe because the caller has to guarantee that the passed <code>level_4_table_addr</code> is valid.</p> <p>We can now use the <code>MapperAllSizes::translate_addr</code> function in our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span style="color:#608b4e;">// initialize GDT, IDT, PICS </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::{ </span><span> structures::paging::MapperAllSizes, </span><span> VirtAddr, </span><span> }; </span><span> </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">LEVEL_4_TABLE_ADDR</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">0o_177777_777_777_777_777_0000</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> recursive_page_table = </span><span style="color:#569cd6;">unsafe </span><span>{ memory::init(</span><span style="color:#b4cea8;">LEVEL_4_TABLE_ADDR</span><span>) }; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> addresses = […]; </span><span style="color:#608b4e;">// as before </span><span> </span><span style="color:#569cd6;">for &amp;</span><span>address </span><span style="color:#569cd6;">in &amp;</span><span>addresses { </span><span> </span><span style="color:#569cd6;">let</span><span> virt_addr = VirtAddr::new(address); </span><span> </span><span style="color:#569cd6;">let</span><span> phys_addr = recursive_page_table.translate_addr(virt_addr); </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;"> -&gt; </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, virt_addr, phys_addr); </span><span> } </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>Instead of using <code>u64</code> for all addresses we now use the <a href="https://docs.rs/x86_64/0.5.2/x86_64/struct.VirtAddr.html"><code>VirtAddr</code></a> and <a href="https://docs.rs/x86_64/0.5.2/x86_64/struct.PhysAddr.html"><code>PhysAddr</code></a> wrapper types to differentiate the two kinds of addresses. In order to be able to call the <code>translate_addr</code> method, we need to import the <code>MapperAllSizes</code> trait.</p> <p>By using the <code>RecursivePageTable</code> type, we now have a safe abstraction and clear ownership semantics. This ensures that we can’t accidentally modify the page table concurrently, because an exclusive borrow of the <code>RecursivePageTable</code> is needed in order to modify it.</p> <p>When we run it, we see the same result as with our handcrafted translation function.</p> <h4 id="making-unsafe-functions-safer"><a class="zola-anchor" href="#making-unsafe-functions-safer" aria-label="Anchor link for: making-unsafe-functions-safer">🔗</a>Making Unsafe Functions Safer</h4> <p>Our <code>memory::init</code> function is an <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#calling-an-unsafe-function-or-method">unsafe function</a>, which means that an <code>unsafe</code> block is required for calling it because the caller has to guarantee that certain requirements are met. In our case, the requirement is that the passed address is mapped to the physical frame of the level 4 page table.</p> <p>The second property of unsafe functions is that their complete body is treated as an <code>unsafe</code> block, which means that it can perform all kinds of unsafe operations without additional unsafe blocks. This is the reason that we didn’t need an <code>unsafe</code> block for dereferencing the raw <code>level_4_table_ptr</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub unsafe fn </span><span>init(level_4_table_addr: </span><span style="color:#569cd6;">usize</span><span>) -&gt; RecursivePageTable&lt;</span><span style="color:#569cd6;">&#39;static</span><span>&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table_ptr = level_4_table_addr </span><span style="color:#569cd6;">as *mut</span><span> PageTable; </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table = </span><span style="color:#569cd6;">&amp;mut </span><span>*level_4_table_ptr; </span><span style="color:#608b4e;">// &lt;- this operation is unsafe </span><span> RecursivePageTable::new(level_4_table).unwrap() </span><span>} </span></code></pre> <p>The problem with this is that we don’t immediately see which parts are unsafe. For example, we don’t know whether the <code>RecursivePageTable::new</code> function is unsafe or not without looking at <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/struct.RecursivePageTable.html#method.new">its definition</a>. This makes it very easy to accidentally do something unsafe without noticing.</p> <p>To avoid this problem, we can add a safe inner function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>init(level_4_table_addr: </span><span style="color:#569cd6;">usize</span><span>) -&gt; RecursivePageTable&lt;</span><span style="color:#569cd6;">&#39;static</span><span>&gt; { </span><span> </span><span style="color:#608b4e;">/// Rust currently treats the whole body of unsafe functions as an unsafe </span><span> </span><span style="color:#608b4e;">/// block, which makes it difficult to see which operations are unsafe. To </span><span> </span><span style="color:#608b4e;">/// limit the scope of unsafe we use a safe inner function. </span><span> </span><span style="color:#569cd6;">fn </span><span>init_inner(level_4_table_addr: </span><span style="color:#569cd6;">usize</span><span>) -&gt; RecursivePageTable&lt;</span><span style="color:#569cd6;">&#39;static</span><span>&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table_ptr = level_4_table_addr </span><span style="color:#569cd6;">as *mut</span><span> PageTable; </span><span> </span><span style="color:#569cd6;">let</span><span> level_4_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*level_4_table_ptr }; </span><span> RecursivePageTable::new(level_4_table).unwrap() </span><span> } </span><span> </span><span> init_inner(level_4_table_addr) </span><span>} </span></code></pre> <p>Now an <code>unsafe</code> block is required again for dereferencing the <code>level_4_table_ptr</code> and we immediately see that this is the only unsafe operations in the function. There is currently an open <a href="https://github.com/rust-lang/rfcs/pull/2585">RFC</a> to change this unfortunate property of unsafe functions that would allow us to avoid the above boilerplate.</p> <h3 id="creating-a-new-mapping"><a class="zola-anchor" href="#creating-a-new-mapping" aria-label="Anchor link for: creating-a-new-mapping">🔗</a>Creating a new Mapping</h3> <p>After reading the page tables and creating a translation function, the next step is to create a new mapping in the page table hierarchy.</p> <p>The difficulty of creating a new mapping depends on the virtual page that we want to map. In the easiest case, the level 1 page table for the page already exists and we just need to write a single entry. In the most difficult case, the page is in a memory region for that no level 3 exists yet so that we need to create new level 3, level 2 and level 1 page tables first.</p> <p>Let’s start with the simple case and assume that we don’t need to create new page tables. The bootloader loads itself in the first megabyte of the virtual address space, so we know that a valid level 1 table exists for this region. We can choose any unused page in this memory region for our example mapping, for example, the page at address <code>0x1000</code>. As the target frame we use <code>0xb8000</code>, the frame of the VGA text buffer. This way we can easily test whether our mapping worked.</p> <p>We implement it in a new <code>create_example_mapping</code> function in our <code>memory</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::{FrameAllocator, PhysFrame, Size4KiB}; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>create_example_mapping( </span><span> recursive_page_table: </span><span style="color:#569cd6;">&amp;mut</span><span> RecursivePageTable, </span><span> frame_allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> impl FrameAllocator&lt;Size4KiB&gt;, </span><span>) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::PageTableFlags </span><span style="color:#569cd6;">as</span><span> Flags; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> page: Page = Page::containing_address(VirtAddr::new(</span><span style="color:#b5cea8;">0x1000</span><span>)); </span><span> </span><span style="color:#569cd6;">let</span><span> frame = PhysFrame::containing_address(PhysAddr::new(</span><span style="color:#b5cea8;">0xb8000</span><span>)); </span><span> </span><span style="color:#569cd6;">let</span><span> flags = Flags::</span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span>Flags::</span><span style="color:#b4cea8;">WRITABLE</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> map_to_result = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> recursive_page_table.map_to(page, frame, flags, frame_allocator) </span><span> }; </span><span> map_to_result.expect(</span><span style="color:#d69d85;">&quot;map_to failed&quot;</span><span>).flush(); </span><span>} </span></code></pre> <p>The function takes a mutable reference to the <code>RecursivePageTable</code> because it needs to modify it and a <code>FrameAllocator</code> that is explained below. It then uses the <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.map_to"><code>map_to</code></a> function of the <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/trait.Mapper.html"><code>Mapper</code></a> trait to map the page at address <code>0x1000</code> to the physical frame at address <code>0xb8000</code>. The function is unsafe because it’s possible to break memory safety with invalid arguments.</p> <p>Apart from the <code>page</code> and <code>frame</code> arguments, the <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.map_to"><code>map_to</code></a> function takes two more arguments. The third argument is a set of flags for the page table entry. We set the <code>PRESENT</code> flag because it is required for all valid entries and the <code>WRITABLE</code> flag to make the mapped page writable.</p> <p>The fourth argument needs to be some structure that implements the <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/trait.FrameAllocator.html"><code>FrameAllocator</code></a> trait. The <code>map_to</code> method needs this argument because it might need unused frames for creating new page tables. The <code>Size4KiB</code> argument in the trait implementation is needed because the <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/page/struct.Page.html"><code>Page</code></a> and <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/frame/struct.PhysFrame.html"><code>PhysFrame</code></a> types are <a href="https://doc.rust-lang.org/book/ch10-00-generics.html">generic</a> over the <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/page/trait.PageSize.html"><code>PageSize</code></a> trait to work with both standard 4KiB pages and huge 2MiB/1GiB pages.</p> <p>The <code>map_to</code> function can fail, so it returns a <a href="https://doc.rust-lang.org/core/result/enum.Result.html"><code>Result</code></a>. Since this is just some example code that does not need to be robust, we just use <a href="https://doc.rust-lang.org/core/result/enum.Result.html#method.expect"><code>expect</code></a> to panic when an error occurs. On success, the function returns a <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/struct.MapperFlush.html"><code>MapperFlush</code></a> type that provides an easy way to flush the newly mapped page from the translation lookaside buffer (TLB) with its <a href="https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush"><code>flush</code></a> method. Like <code>Result</code>, the type uses the [<code>#[must_use]</code>] attribute to emit a warning when we accidentally forget to use it.</p> <p>[<code>#[must_use]</code>]: https://doc.rust-lang.org/std/result/#results-must-be-used</p> <p>Since we know that no new page tables are required for the address <code>0x1000</code>, a frame allocator that always returns <code>None</code> suffices. We create such an <code>EmptyFrameAllocator</code> for testing our mapping function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#608b4e;">/// A FrameAllocator that always returns `None`. </span><span style="color:#569cd6;">pub struct </span><span>EmptyFrameAllocator; </span><span> </span><span style="color:#569cd6;">impl </span><span>FrameAllocator&lt;Size4KiB&gt; </span><span style="color:#569cd6;">for </span><span>EmptyFrameAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;PhysFrame&gt; { </span><span> None </span><span> } </span><span>} </span></code></pre> <p>(If you’re getting a ‘method <code>allocate_frame</code> is not a member of trait <code>FrameAllocator</code>’ error, you need to update <code>x86_64</code> to version 0.4.0.)</p> <p>We can now test the new mapping function in our <code>main.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span style="color:#608b4e;">// initialize GDT, IDT, PICS </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::memory::{create_example_mapping, EmptyFrameAllocator}; </span><span> </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">LEVEL_4_TABLE_ADDR</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">0o_177777_777_777_777_777_0000</span><span>; </span><span> </span><span style="color:#569cd6;">let mut</span><span> recursive_page_table = </span><span style="color:#569cd6;">unsafe </span><span>{ memory::init(</span><span style="color:#b4cea8;">LEVEL_4_TABLE_ADDR</span><span>) }; </span><span> </span><span> create_example_mapping(</span><span style="color:#569cd6;">&amp;mut</span><span> recursive_page_table, </span><span style="color:#569cd6;">&amp;mut</span><span> EmptyFrameAllocator); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ (</span><span style="color:#b5cea8;">0x1900 </span><span style="color:#569cd6;">as *mut u64</span><span>).write_volatile(</span><span style="color:#b5cea8;">0xf021_f077_f065_f04e</span><span>)}; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>We first create the mapping for the page at <code>0x1000</code> by calling our <code>create_example_mapping</code> function with a mutable reference to the <code>RecursivePageTable</code> instance. This maps the page <code>0x1000</code> to the VGA text buffer, so we should see any write to it on the screen.</p> <p>Then we write the value <code>0xf021_f077_f065_f04e</code> to this page, which represents the string <em>“New!”</em> on white background. We don’t write directly to the beginning of the page at <code>0x1000</code> since the top line is directly shifted off the screen by the next <code>println</code>. Instead, we write to offset <code>0x900</code>, which is about in the middle of the screen. As we learned <a href="https://os.phil-opp.com/vga-text-mode/#volatile">in the <em>“VGA Text Mode”</em> post</a>, writes to the VGA buffer should be volatile, so we use the <a href="https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile"><code>write_volatile</code></a> method.</p> <p>When we run it in QEMU, we see the following output:</p> <p><img src="https://os.phil-opp.com/advanced-paging/qemu-new-mapping.png" alt="QEMU printing “It did not crash!” with four completely white cells in the middle of the screen" /></p> <p>The <em>“New!”</em> on the screen is by our write to <code>0x1900</code>, which means that we successfully created a new mapping in the page tables.</p> <p>This only worked because there was already a level 1 table for mapping page <code>0x1000</code>. When we try to map a page for that no level 1 table exists yet, the <code>map_to</code> function fails because it tries to allocate frames from the <code>EmptyFrameAllocator</code> for creating new page tables. We can see that happen when we try to map page <code>0xdeadbeaf000</code> instead of <code>0x1000</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>create_example_mapping(…) { </span><span> […] </span><span> </span><span style="color:#569cd6;">let</span><span> page: Page = Page::containing_address(VirtAddr::new(</span><span style="color:#b5cea8;">0xdeadbeaf000</span><span>)); </span><span> […] </span><span>} </span><span> </span><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ (</span><span style="color:#b5cea8;">0xdeadbeaf900 </span><span style="color:#569cd6;">as *mut u64</span><span>).write_volatile(</span><span style="color:#b5cea8;">0xf021_f077_f065_f04e</span><span>)}; </span><span> […] </span><span>} </span></code></pre> <p>When we run it, a panic with the following error message occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>panicked at &#39;map_to failed: FrameAllocationFailed&#39;, /…/result.rs:999:5 </span></code></pre> <p>To map pages that don’t have a level 1 page table yet we need to create a proper <code>FrameAllocator</code>. But how do we know which frames are unused and how much physical memory is available?</p> <h3 id="boot-information"><a class="zola-anchor" href="#boot-information" aria-label="Anchor link for: boot-information">🔗</a>Boot Information</h3> <p>The amount of physical memory and the memory regions reserved by devices like the VGA hardware vary between different machines. Only the BIOS or UEFI firmware knows exactly which memory regions can be used by the operating system and which regions are reserved. Both firmware standards provide functions to retrieve the memory map, but they can only be called very early in the boot process. For this reason, the bootloader already queries this and other information from the firmware.</p> <p>To communicate this information to our kernel, the bootloader passes a reference to a boot information structure as an argument when calling our <code>_start</code> function. Right now we don’t have this argument declared in our function, so it is ignored. Let’s add it:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::bootinfo::BootInfo; </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span style="color:#608b4e;">// new argument </span><span> […] </span><span>} </span></code></pre> <p>The <code>BootInfo</code> struct is still in an early stage, so expect some breakage when updating to future <a href="https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements">semver-incompatible</a> bootloader versions. It currently has the three fields <code>p4_table_addr</code>, <code>memory_map</code>, and <code>package</code>:</p> <ul> <li>The <code>p4_table_addr</code> field contains the recursive virtual address of the level 4 page table. By using this field we can avoid hardcoding the address <code>0o_177777_777_777_777_777_0000</code>.</li> <li>The <code>memory_map</code> field is most interesting to us since it contains a list of all memory regions and their type (i.e. unused, reserved, or other).</li> <li>The <code>package</code> field is an in-progress feature to bundle additional data with the bootloader. The implementation is not finished, so we can ignore this field for now.</li> </ul> <p>Before we use the <code>memory_map</code> field to create a proper <code>FrameAllocator</code>, we want to ensure that we can’t use a <code>boot_info</code> argument of the wrong type.</p> <h4 id="the-entry-point-macro"><a class="zola-anchor" href="#the-entry-point-macro" aria-label="Anchor link for: the-entry-point-macro">🔗</a>The <code>entry_point</code> Macro</h4> <p>Since our <code>_start</code> function is called externally from the bootloader, no checking of our function signature occurs. This means that we could let it take arbitrary arguments without any compilation errors, but it would fail or cause undefined behavior at runtime.</p> <p>To make sure that the entry point function has always the correct signature that the bootloader expects, the <code>bootloader</code> crate provides an <code>entry_point</code> macro that provides a type-checked way to define a Rust function as the entry point. Let’s rewrite our entry point function to use this macro:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::{bootinfo::BootInfo, entry_point}; </span><span> </span><span>entry_point!(kernel_main); </span><span> </span><span>#[cfg(not(test))] </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span style="color:#608b4e;">// initialize GDT, IDT, PICS </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> recursive_page_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> memory::init(boot_info.p4_table_addr </span><span style="color:#569cd6;">as usize</span><span>) </span><span> }; </span><span> </span><span> […] </span><span style="color:#608b4e;">// create and test example mapping </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>We no longer need to use <code>extern "C"</code> or <code>no_mangle</code> for our entry point, as the macro defines the real lower level <code>_start</code> entry point for us. The <code>kernel_main</code> function is now a completely normal Rust function, so we can choose an arbitrary name for it. The important thing is that it is type-checked so that a compilation error occurs when we now try to modify the function signature in any way, for example adding an argument or changing the argument type.</p> <p>Note that we now pass <code>boot_info.p4_table_addr</code> instead of a hardcoded address to our <code>memory::init</code>. Thus our code continues to work even if a future version of the bootloader chooses a different entry of the level 4 page table for the recursive mapping.</p> <h3 id="allocating-frames"><a class="zola-anchor" href="#allocating-frames" aria-label="Anchor link for: allocating-frames">🔗</a>Allocating Frames</h3> <p>Now that we have access to the memory map through the boot information we can create a proper frame allocator on top. We start with a generic skeleton:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>BootInfoFrameAllocator&lt;I&gt; where I: Iterator&lt;Item = PhysFrame&gt; { </span><span> frames: I, </span><span>} </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;I&gt; FrameAllocator&lt;Size4KiB&gt; </span><span style="color:#569cd6;">for </span><span>BootInfoFrameAllocator&lt;I&gt; </span><span> </span><span style="color:#569cd6;">where</span><span> I: Iterator&lt;Item = PhysFrame&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;PhysFrame&gt; { </span><span> self.frames.next() </span><span> } </span><span>} </span></code></pre> <p>The <code>frames</code> field can be initialized with an arbitrary <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html"><code>Iterator</code></a> of frames. This allows us to just delegate <code>alloc</code> calls to the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#tymethod.next"><code>Iterator::next</code></a> method.</p> <p>The initialization of the <code>BootInfoFrameAllocator</code> happens in a new <code>init_frame_allocator</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bootloader::bootinfo::{MemoryMap, MemoryRegionType}; </span><span> </span><span style="color:#608b4e;">/// Create a FrameAllocator from the passed memory map </span><span style="color:#569cd6;">pub fn </span><span>init_frame_allocator( </span><span> memory_map: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> MemoryMap, </span><span>) -&gt; BootInfoFrameAllocator&lt;impl Iterator&lt;Item = PhysFrame&gt;&gt; { </span><span> </span><span style="color:#608b4e;">// get usable regions from memory map </span><span> </span><span style="color:#569cd6;">let</span><span> regions = memory_map </span><span> .iter() </span><span> .filter(|r| r.region_type == MemoryRegionType::Usable); </span><span> </span><span style="color:#608b4e;">// map each region to its address range </span><span> </span><span style="color:#569cd6;">let</span><span> addr_ranges = regions.map(|r| r.range.start_addr()</span><span style="color:#569cd6;">..</span><span>r.range.end_addr()); </span><span> </span><span style="color:#608b4e;">// transform to an iterator of frame start addresses </span><span> </span><span style="color:#569cd6;">let</span><span> frame_addresses = addr_ranges.flat_map(|r| r.into_iter().step_by(</span><span style="color:#b5cea8;">4096</span><span>)); </span><span> </span><span style="color:#608b4e;">// create `PhysFrame` types from the start addresses </span><span> </span><span style="color:#569cd6;">let</span><span> frames = frame_addresses.map(|addr| { </span><span> PhysFrame::containing_address(PhysAddr::new(addr)) </span><span> }); </span><span> </span><span> BootInfoFrameAllocator { frames } </span><span>} </span></code></pre> <p>This function uses iterator combinator methods to transform the initial <code>MemoryMap</code> into an iterator of usable physical frames:</p> <ul> <li>First, we call the <code>iter</code> method to convert the memory map to an iterator of <code>MemoryRegion</code>s. Then we use the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter"><code>filter</code></a> method to skip any reserved or otherwise unavailable regions. The bootloader updates the memory map for all the mappings it creates, so frames that are used by our kernel (code, data or stack) or to store the boot information are already marked as <code>InUse</code> or similar. Thus we can be sure that <code>Usable</code> frames are not used somewhere else.</li> <li>In the second step, we use the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.map"><code>map</code></a> combinator and Rust’s <a href="https://doc.rust-lang.org/core/ops/struct.Range.html">range syntax</a> to transform our iterator of memory regions to an iterator of address ranges.</li> <li>The third step is the most complicated: We convert each range to an iterator through the <code>into_iter</code> method and then choose every 4096th address using <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.step_by"><code>step_by</code></a>. Since 4096 bytes (= 4 KiB) is the page size, we get the start address of each frame. The bootloader page aligns all usable memory areas so that we don’t need any alignment or rounding code here. By using <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map"><code>flat_map</code></a> instead of <code>map</code>, we get an <code>Iterator&lt;Item = u64&gt;</code> instead of an <code>Iterator&lt;Item = Iterator&lt;Item = u64&gt;&gt;</code>.</li> <li>In the final step, we convert the start addresses to <code>PhysFrame</code> types to construct the desired <code>Iterator&lt;Item = PhysFrame&gt;</code>. We then use this iterator to create and return a new <code>BootInfoFrameAllocator</code>.</li> </ul> <p>We can now modify our <code>kernel_main</code> function to pass a <code>BootInfoFrameAllocator</code> instance instead of an <code>EmptyFrameAllocator</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(not(test))] </span><span style="color:#569cd6;">fn </span><span>kernel_main(boot_info: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> BootInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span style="color:#608b4e;">// initialize GDT, IDT, PICS </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::paging::{PageTable, RecursivePageTable}; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> recursive_page_table = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> memory::init(boot_info.p4_table_addr </span><span style="color:#569cd6;">as usize</span><span>) </span><span> }; </span><span> </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = memory::init_frame_allocator(</span><span style="color:#569cd6;">&amp;</span><span>boot_info.memory_map); </span><span> </span><span> blog_os::memory::create_example_mapping(</span><span style="color:#569cd6;">&amp;mut</span><span> recursive_page_table, </span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ (</span><span style="color:#b5cea8;">0xdeadbeaf900 </span><span style="color:#569cd6;">as *mut u64</span><span>).write_volatile(</span><span style="color:#b5cea8;">0xf021_f077_f065_f04e</span><span>)}; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>Now the mapping succeeds and we see the black-on-white <em>“New!”</em> on the screen again. Behind the scenes, the <code>map_to</code> method creates the missing page tables in the following way:</p> <ul> <li>Allocate an unused frame from the passed <code>frame_allocator</code>.</li> <li>Map the entry of the higher level table to that frame. Now the frame is accessible through the recursive page table.</li> <li>Zero the frame to create a new, empty page table.</li> <li>Continue with the next table level.</li> </ul> <p>While our <code>create_example_mapping</code> function is just some example code, we are now able to create new mappings for arbitrary pages. This will be essential for allocating memory or implementing multithreading in future posts.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>In this post we learned how a recursive level 4 table entry can be used to map all page table frames to calculatable virtual addresses. We used this technique to implement an address translation function and to create a new mapping in the page tables.</p> <p>We saw that the creation of new mappings requires unused frames for creating new page tables. Such a frame allocator can be implemented on top of the boot information structure that the bootloader passes to our kernel.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>The next post will create a heap memory region for our kernel, which will allow us to <a href="https://doc.rust-lang.org/alloc/boxed/struct.Box.html">allocate memory</a> and use various <a href="https://doc.rust-lang.org/alloc/collections/index.html">collection types</a>.</p> Introduction to Paging Mon, 14 Jan 2019 00:00:00 +0000 https://os.phil-opp.com/paging-introduction/ https://os.phil-opp.com/paging-introduction/ <p>This post introduces <em>paging</em>, a very common memory management scheme that we will also use for our operating system. It explains why memory isolation is needed, how <em>segmentation</em> works, what <em>virtual memory</em> is, and how paging solves memory fragmentation issues. It also explores the layout of multilevel page tables on the x86_64 architecture.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/paging-introduction/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-08"><code>post-08</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="memory-protection"><a class="zola-anchor" href="#memory-protection" aria-label="Anchor link for: memory-protection">🔗</a>Memory Protection</h2> <p>One main task of an operating system is to isolate programs from each other. Your web browser shouldn’t be able to interfere with your text editor, for example. To achieve this goal, operating systems utilize hardware functionality to ensure that memory areas of one process are not accessible by other processes. There are different approaches depending on the hardware and the OS implementation.</p> <p>As an example, some ARM Cortex-M processors (used for embedded systems) have a <a href="https://developer.arm.com/docs/ddi0337/e/memory-protection-unit/about-the-mpu"><em>Memory Protection Unit</em></a> (MPU), which allows you to define a small number (e.g., 8) of memory regions with different access permissions (e.g., no access, read-only, read-write). On each memory access, the MPU ensures that the address is in a region with correct access permissions and throws an exception otherwise. By changing the regions and access permissions on each process switch, the operating system can ensure that each process only accesses its own memory and thus isolates processes from each other.</p> <p>On x86, the hardware supports two different approaches to memory protection: <a href="https://en.wikipedia.org/wiki/X86_memory_segmentation">segmentation</a> and <a href="https://en.wikipedia.org/wiki/Virtual_memory#Paged_virtual_memory">paging</a>.</p> <h2 id="segmentation"><a class="zola-anchor" href="#segmentation" aria-label="Anchor link for: segmentation">🔗</a>Segmentation</h2> <p>Segmentation was already introduced in 1978, originally to increase the amount of addressable memory. The situation back then was that CPUs only used 16-bit addresses, which limited the amount of addressable memory to 64 KiB. To make more than these 64 KiB accessible, additional segment registers were introduced, each containing an offset address. The CPU automatically added this offset on each memory access, so that up to 1 MiB of memory was accessible.</p> <p>The segment register is chosen automatically by the CPU depending on the kind of memory access: For fetching instructions, the code segment <code>CS</code> is used, and for stack operations (push/pop), the stack segment <code>SS</code> is used. Other instructions use the data segment <code>DS</code> or the extra segment <code>ES</code>. Later, two additional segment registers, <code>FS</code> and <code>GS</code>, were added, which can be used freely.</p> <p>In the first version of segmentation, the segment registers directly contained the offset and no access control was performed. This was changed later with the introduction of the <a href="https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode"><em>protected mode</em></a>. When the CPU runs in this mode, the segment descriptors contain an index into a local or global <a href="https://en.wikipedia.org/wiki/Global_Descriptor_Table"><em>descriptor table</em></a>, which contains – in addition to an offset address – the segment size and access permissions. By loading separate global/local descriptor tables for each process, which confine memory accesses to the process’s own memory areas, the OS can isolate processes from each other.</p> <p>By modifying the memory addresses before the actual access, segmentation already employed a technique that is now used almost everywhere: <em>virtual memory</em>.</p> <h3 id="virtual-memory"><a class="zola-anchor" href="#virtual-memory" aria-label="Anchor link for: virtual-memory">🔗</a>Virtual Memory</h3> <p>The idea behind virtual memory is to abstract away the memory addresses from the underlying physical storage device. Instead of directly accessing the storage device, a translation step is performed first. For segmentation, the translation step is to add the offset address of the active segment. Imagine a program accessing memory address <code>0x1234000</code> in a segment with an offset of <code>0x1111000</code>: The address that is really accessed is <code>0x2345000</code>.</p> <p>To differentiate the two address types, addresses before the translation are called <em>virtual</em>, and addresses after the translation are called <em>physical</em>. One important difference between these two kinds of addresses is that physical addresses are unique and always refer to the same distinct memory location. Virtual addresses, on the other hand, depend on the translation function. It is entirely possible that two different virtual addresses refer to the same physical address. Also, identical virtual addresses can refer to different physical addresses when they use different translation functions.</p> <p>An example where this property is useful is running the same program twice in parallel:</p> <p><img src="https://os.phil-opp.com/paging-introduction/segmentation-same-program-twice.svg" alt="Two virtual address spaces with address 0–150, one translated to 100–250, the other to 300–450" /></p> <p>Here the same program runs twice, but with different translation functions. The first instance has a segment offset of 100, so that its virtual addresses 0–150 are translated to the physical addresses 100–250. The second instance has an offset of 300, which translates its virtual addresses 0–150 to physical addresses 300–450. This allows both programs to run the same code and use the same virtual addresses without interfering with each other.</p> <p>Another advantage is that programs can now be placed at arbitrary physical memory locations, even if they use completely different virtual addresses. Thus, the OS can utilize the full amount of available memory without needing to recompile programs.</p> <h3 id="fragmentation"><a class="zola-anchor" href="#fragmentation" aria-label="Anchor link for: fragmentation">🔗</a>Fragmentation</h3> <p>The differentiation between virtual and physical addresses makes segmentation really powerful. However, it has the problem of fragmentation. As an example, imagine that we want to run a third copy of the program we saw above:</p> <p><img src="https://os.phil-opp.com/paging-introduction/segmentation-fragmentation.svg" alt="Three virtual address spaces, but there is not enough continuous space for the third" /></p> <p>There is no way to map the third instance of the program to virtual memory without overlapping, even though there is more than enough free memory available. The problem is that we need <em>continuous</em> memory and can’t use the small free chunks.</p> <p>One way to combat this fragmentation is to pause execution, move the used parts of the memory closer together, update the translation, and then resume execution:</p> <p><img src="https://os.phil-opp.com/paging-introduction/segmentation-fragmentation-compacted.svg" alt="Three virtual address spaces after defragmentation" /></p> <p>Now there is enough continuous space to start the third instance of our program.</p> <p>The disadvantage of this defragmentation process is that it needs to copy large amounts of memory, which decreases performance. It also needs to be done regularly before the memory becomes too fragmented. This makes performance unpredictable since programs are paused at random times and might become unresponsive.</p> <p>The fragmentation problem is one of the reasons that segmentation is no longer used by most systems. In fact, segmentation is not even supported in 64-bit mode on x86 anymore. Instead, <em>paging</em> is used, which completely avoids the fragmentation problem.</p> <h2 id="paging"><a class="zola-anchor" href="#paging" aria-label="Anchor link for: paging">🔗</a>Paging</h2> <p>The idea is to divide both the virtual and physical memory space into small, fixed-size blocks. The blocks of the virtual memory space are called <em>pages</em>, and the blocks of the physical address space are called <em>frames</em>. Each page can be individually mapped to a frame, which makes it possible to split larger memory regions across non-continuous physical frames.</p> <p>The advantage of this becomes visible if we recap the example of the fragmented memory space, but use paging instead of segmentation this time:</p> <p><img src="https://os.phil-opp.com/paging-introduction/paging-fragmentation.svg" alt="With paging, the third program instance can be split across many smaller physical areas." /></p> <p>In this example, we have a page size of 50 bytes, which means that each of our memory regions is split across three pages. Each page is mapped to a frame individually, so a continuous virtual memory region can be mapped to non-continuous physical frames. This allows us to start the third instance of the program without performing any defragmentation before.</p> <h3 id="hidden-fragmentation"><a class="zola-anchor" href="#hidden-fragmentation" aria-label="Anchor link for: hidden-fragmentation">🔗</a>Hidden Fragmentation</h3> <p>Compared to segmentation, paging uses lots of small, fixed-sized memory regions instead of a few large, variable-sized regions. Since every frame has the same size, there are no frames that are too small to be used, so no fragmentation occurs.</p> <p>Or it <em>seems</em> like no fragmentation occurs. There is still some hidden kind of fragmentation, the so-called <em>internal fragmentation</em>. Internal fragmentation occurs because not every memory region is an exact multiple of the page size. Imagine a program of size 101 in the above example: It would still need three pages of size 50, so it would occupy 49 bytes more than needed. To differentiate the two types of fragmentation, the kind of fragmentation that happens when using segmentation is called <em>external fragmentation</em>.</p> <p>Internal fragmentation is unfortunate but often better than the external fragmentation that occurs with segmentation. It still wastes memory, but does not require defragmentation and makes the amount of fragmentation predictable (on average half a page per memory region).</p> <h3 id="page-tables"><a class="zola-anchor" href="#page-tables" aria-label="Anchor link for: page-tables">🔗</a>Page Tables</h3> <p>We saw that each of the potentially millions of pages is individually mapped to a frame. This mapping information needs to be stored somewhere. Segmentation uses an individual segment selector register for each active memory region, which is not possible for paging since there are way more pages than registers. Instead, paging uses a table structure called <em>page table</em> to store the mapping information.</p> <p>For our above example, the page tables would look like this:</p> <p><img src="https://os.phil-opp.com/paging-introduction/paging-page-tables.svg" alt="Three page tables, one for each program instance. For instance 1, the mapping is 0-&gt;100, 50-&gt;150, 100-&gt;200. For instance 2, it is 0-&gt;300, 50-&gt;350, 100-&gt;400. For instance 3, it is 0-&gt;250, 50-&gt;450, 100-&gt;500." /></p> <p>We see that each program instance has its own page table. A pointer to the currently active table is stored in a special CPU register. On <code>x86</code>, this register is called <code>CR3</code>. It is the job of the operating system to load this register with the pointer to the correct page table before running each program instance.</p> <p>On each memory access, the CPU reads the table pointer from the register and looks up the mapped frame for the accessed page in the table. This is entirely done in hardware and completely invisible to the running program. To speed up the translation process, many CPU architectures have a special cache that remembers the results of the last translations.</p> <p>Depending on the architecture, page table entries can also store attributes such as access permissions in a flags field. In the above example, the “r/w” flag makes the page both readable and writable.</p> <h3 id="multilevel-page-tables"><a class="zola-anchor" href="#multilevel-page-tables" aria-label="Anchor link for: multilevel-page-tables">🔗</a>Multilevel Page Tables</h3> <p>The simple page tables we just saw have a problem in larger address spaces: they waste memory. For example, imagine a program that uses the four virtual pages <code>0</code>, <code>1_000_000</code>, <code>1_000_050</code>, and <code>1_000_100</code> (we use <code>_</code> as a thousands separator):</p> <p><img src="https://os.phil-opp.com/paging-introduction/single-level-page-table.svg" alt="Page 0 mapped to frame 0 and pages 1_000_000–1_000_150 mapped to frames 100–250" /></p> <p>It only needs 4 physical frames, but the page table has over a million entries. We can’t omit the empty entries because then the CPU would no longer be able to jump directly to the correct entry in the translation process (e.g., it is no longer guaranteed that the fourth page uses the fourth entry).</p> <p>To reduce the wasted memory, we can use a <strong>two-level page table</strong>. The idea is that we use different page tables for different address regions. An additional table called <em>level 2</em> page table contains the mapping between address regions and (level 1) page tables.</p> <p>This is best explained by an example. Let’s define that each level 1 page table is responsible for a region of size <code>10_000</code>. Then the following tables would exist for the above example mapping:</p> <p><img src="https://os.phil-opp.com/paging-introduction/multilevel-page-table.svg" alt="Page 0 points to entry 0 of the level 2 page table, which points to the level 1 page table T1. The first entry of T1 points to frame 0; the other entries are empty. Pages 1_000_000–1_000_150 point to the 100th entry of the level 2 page table, which points to a different level 1 page table T2. The first three entries of T2 point to frames 100–250; the other entries are empty." /></p> <p>Page 0 falls into the first <code>10_000</code> byte region, so it uses the first entry of the level 2 page table. This entry points to level 1 page table T1, which specifies that page <code>0</code> points to frame <code>0</code>.</p> <p>The pages <code>1_000_000</code>, <code>1_000_050</code>, and <code>1_000_100</code> all fall into the 100th <code>10_000</code> byte region, so they use the 100th entry of the level 2 page table. This entry points to a different level 1 page table T2, which maps the three pages to frames <code>100</code>, <code>150</code>, and <code>200</code>. Note that the page address in level 1 tables does not include the region offset. For example, the entry for page <code>1_000_050</code> is just <code>50</code>.</p> <p>We still have 100 empty entries in the level 2 table, but much fewer than the million empty entries before. The reason for these savings is that we don’t need to create level 1 page tables for the unmapped memory regions between <code>10_000</code> and <code>1_000_000</code>.</p> <p>The principle of two-level page tables can be extended to three, four, or more levels. Then the page table register points to the highest level table, which points to the next lower level table, which points to the next lower level, and so on. The level 1 page table then points to the mapped frame. The principle in general is called a <em>multilevel</em> or <em>hierarchical</em> page table.</p> <p>Now that we know how paging and multilevel page tables work, we can look at how paging is implemented in the x86_64 architecture (we assume in the following that the CPU runs in 64-bit mode).</p> <h2 id="paging-on-x86-64"><a class="zola-anchor" href="#paging-on-x86-64" aria-label="Anchor link for: paging-on-x86-64">🔗</a>Paging on x86_64</h2> <p>The x86_64 architecture uses a 4-level page table and a page size of 4 KiB. Each page table, independent of the level, has a fixed size of 512 entries. Each entry has a size of 8 bytes, so each table is 512 * 8 B = 4 KiB large and thus fits exactly into one page.</p> <p>The page table index for each level is derived directly from the virtual address:</p> <p><img src="https://os.phil-opp.com/paging-introduction/x86_64-table-indices-from-address.svg" alt="Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index" /></p> <p>We see that each table index consists of 9 bits, which makes sense because each table has 2^9 = 512 entries. The lowest 12 bits are the offset in the 4 KiB page (2^12 bytes = 4 KiB). Bits 48 to 64 are discarded, which means that x86_64 is not really 64-bit since it only supports 48-bit addresses.</p> <p>Even though bits 48 to 64 are discarded, they can’t be set to arbitrary values. Instead, all bits in this range have to be copies of bit 47 in order to keep addresses unique and allow future extensions like the 5-level page table. This is called <em>sign-extension</em> because it’s very similar to the <a href="https://en.wikipedia.org/wiki/Two&#x27;s_complement#Sign_extension">sign extension in two’s complement</a>. When an address is not correctly sign-extended, the CPU throws an exception.</p> <p>It’s worth noting that the recent “Ice Lake” Intel CPUs optionally support <a href="https://en.wikipedia.org/wiki/Intel_5-level_paging">5-level page tables</a> to extend virtual addresses from 48-bit to 57-bit. Given that optimizing our kernel for a specific CPU does not make sense at this stage, we will only work with standard 4-level page tables in this post.</p> <h3 id="example-translation"><a class="zola-anchor" href="#example-translation" aria-label="Anchor link for: example-translation">🔗</a>Example Translation</h3> <p>Let’s go through an example to understand how the translation process works in detail:</p> <p><img src="https://os.phil-opp.com/paging-introduction/x86_64-page-table-translation.svg" alt="An example of a 4-level page hierarchy with each page table shown in physical memory" /></p> <p>The physical address of the currently active level 4 page table, which is the root of the 4-level page table, is stored in the <code>CR3</code> register. Each page table entry then points to the physical frame of the next level table. The entry of the level 1 table then points to the mapped frame. Note that all addresses in the page tables are physical instead of virtual, because otherwise the CPU would need to translate those addresses too (which could cause a never-ending recursion).</p> <p>The above page table hierarchy maps two pages (in blue). From the page table indices, we can deduce that the virtual addresses of these two pages are <code>0x803FE7F000</code> and <code>0x803FE00000</code>. Let’s see what happens when the program tries to read from address <code>0x803FE7F5CE</code>. First, we convert the address to binary and determine the page table indices and the page offset for the address:</p> <p><img src="https://os.phil-opp.com/paging-introduction/x86_64-page-table-translation-addresses.png" alt="The sign extension bits are all 0, the level 4 index is 1, the level 3 index is 0, the level 2 index is 511, the level 1 index is 127, and the page offset is 0x5ce" /></p> <p>With these indices, we can now walk the page table hierarchy to determine the mapped frame for the address:</p> <ul> <li>We start by reading the address of the level 4 table out of the <code>CR3</code> register.</li> <li>The level 4 index is 1, so we look at the entry with index 1 of that table, which tells us that the level 3 table is stored at address 16 KiB.</li> <li>We load the level 3 table from that address and look at the entry with index 0, which points us to the level 2 table at 24 KiB.</li> <li>The level 2 index is 511, so we look at the last entry of that page to find out the address of the level 1 table.</li> <li>Through the entry with index 127 of the level 1 table, we finally find out that the page is mapped to frame 12 KiB, or 0x3000 in hexadecimal.</li> <li>The final step is to add the page offset to the frame address to get the physical address 0x3000 + 0x5ce = 0x35ce.</li> </ul> <p><img src="https://os.phil-opp.com/paging-introduction/x86_64-page-table-translation-steps.svg" alt="The same example 4-level page hierarchy with 5 additional arrows: “Step 0” from the CR3 register to the level 4 table, “Step 1” from the level 4 entry to the level 3 table, “Step 2” from the level 3 entry to the level 2 table, “Step 3” from the level 2 entry to the level 1 table, and “Step 4” from the level 1 table to the mapped frames." /></p> <p>The permissions for the page in the level 1 table are <code>r</code>, which means read-only. The hardware enforces these permissions and would throw an exception if we tried to write to that page. Permissions in higher level pages restrict the possible permissions in lower levels, so if we set the level 3 entry to read-only, no pages that use this entry can be writable, even if lower levels specify read/write permissions.</p> <p>It’s important to note that even though this example used only a single instance of each table, there are typically multiple instances of each level in each address space. At maximum, there are:</p> <ul> <li>one level 4 table,</li> <li>512 level 3 tables (because the level 4 table has 512 entries),</li> <li>512 * 512 level 2 tables (because each of the 512 level 3 tables has 512 entries), and</li> <li>512 * 512 * 512 level 1 tables (512 entries for each level 2 table).</li> </ul> <h3 id="page-table-format"><a class="zola-anchor" href="#page-table-format" aria-label="Anchor link for: page-table-format">🔗</a>Page Table Format</h3> <p>Page tables on the x86_64 architecture are basically an array of 512 entries. In Rust syntax:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[repr(align(4096))] </span><span style="color:#569cd6;">pub struct </span><span>PageTable { </span><span> entries: [PageTableEntry; 512], </span><span>} </span></code></pre> <p>As indicated by the <code>repr</code> attribute, page tables need to be page-aligned, i.e., aligned on a 4 KiB boundary. This requirement guarantees that a page table always fills a complete page and allows an optimization that makes entries very compact.</p> <p>Each entry is 8 bytes (64 bits) large and has the following format:</p> <table><thead><tr><th>Bit(s)</th><th>Name</th><th>Meaning</th></tr></thead><tbody> <tr><td>0</td><td>present</td><td>the page is currently in memory</td></tr> <tr><td>1</td><td>writable</td><td>it’s allowed to write to this page</td></tr> <tr><td>2</td><td>user accessible</td><td>if not set, only kernel mode code can access this page</td></tr> <tr><td>3</td><td>write-through caching</td><td>writes go directly to memory</td></tr> <tr><td>4</td><td>disable cache</td><td>no cache is used for this page</td></tr> <tr><td>5</td><td>accessed</td><td>the CPU sets this bit when this page is used</td></tr> <tr><td>6</td><td>dirty</td><td>the CPU sets this bit when a write to this page occurs</td></tr> <tr><td>7</td><td>huge page/null</td><td>must be 0 in P1 and P4, creates a 1 GiB page in P3, creates a 2 MiB page in P2</td></tr> <tr><td>8</td><td>global</td><td>page isn’t flushed from caches on address space switch (PGE bit of CR4 register must be set)</td></tr> <tr><td>9-11</td><td>available</td><td>can be used freely by the OS</td></tr> <tr><td>12-51</td><td>physical address</td><td>the page aligned 52bit physical address of the frame or the next page table</td></tr> <tr><td>52-62</td><td>available</td><td>can be used freely by the OS</td></tr> <tr><td>63</td><td>no execute</td><td>forbid executing code on this page (the NXE bit in the EFER register must be set)</td></tr> </tbody></table> <p>We see that only bits 12–51 are used to store the physical frame address. The remaining bits are used as flags or can be freely used by the operating system. This is possible because we always point to a 4096-byte aligned address, either to a page-aligned page table or to the start of a mapped frame. This means that bits 0–11 are always zero, so there is no reason to store these bits because the hardware can just set them to zero before using the address. The same is true for bits 52–63, because the x86_64 architecture only supports 52-bit physical addresses (similar to how it only supports 48-bit virtual addresses).</p> <p>Let’s take a closer look at the available flags:</p> <ul> <li>The <code>present</code> flag differentiates mapped pages from unmapped ones. It can be used to temporarily swap out pages to disk when the main memory becomes full. When the page is accessed subsequently, a special exception called <em>page fault</em> occurs, to which the operating system can react by reloading the missing page from disk and then continuing the program.</li> <li>The <code>writable</code> and <code>no execute</code> flags control whether the contents of the page are writable or contain executable instructions, respectively.</li> <li>The <code>accessed</code> and <code>dirty</code> flags are automatically set by the CPU when a read or write to the page occurs. This information can be leveraged by the operating system, e.g., to decide which pages to swap out or whether the page contents have been modified since the last save to disk.</li> <li>The <code>write-through caching</code> and <code>disable cache</code> flags allow the control of caches for every page individually.</li> <li>The <code>user accessible</code> flag makes a page available to userspace code, otherwise, it is only accessible when the CPU is in kernel mode. This feature can be used to make <a href="https://en.wikipedia.org/wiki/System_call">system calls</a> faster by keeping the kernel mapped while a userspace program is running. However, the <a href="https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)">Spectre</a> vulnerability can allow userspace programs to read these pages nonetheless.</li> <li>The <code>global</code> flag signals to the hardware that a page is available in all address spaces and thus does not need to be removed from the translation cache (see the section about the TLB below) on address space switches. This flag is commonly used together with a cleared <code>user accessible</code> flag to map the kernel code to all address spaces.</li> <li>The <code>huge page</code> flag allows the creation of pages of larger sizes by letting the entries of the level 2 or level 3 page tables directly point to a mapped frame. With this bit set, the page size increases by factor 512 to either 2 MiB = 512 * 4 KiB for level 2 entries or even 1 GiB = 512 * 2 MiB for level 3 entries. The advantage of using larger pages is that fewer lines of the translation cache and fewer page tables are needed.</li> </ul> <p>The <code>x86_64</code> crate provides types for <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTable.html">page tables</a> and their <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html">entries</a>, so we don’t need to create these structures ourselves.</p> <h3 id="the-translation-lookaside-buffer"><a class="zola-anchor" href="#the-translation-lookaside-buffer" aria-label="Anchor link for: the-translation-lookaside-buffer">🔗</a>The Translation Lookaside Buffer</h3> <p>A 4-level page table makes the translation of virtual addresses expensive because each translation requires four memory accesses. To improve performance, the x86_64 architecture caches the last few translations in the so-called <em>translation lookaside buffer</em> (TLB). This allows skipping the translation when it is still cached.</p> <p>Unlike the other CPU caches, the TLB is not fully transparent and does not update or remove translations when the contents of page tables change. This means that the kernel must manually update the TLB whenever it modifies a page table. To do this, there is a special CPU instruction called <a href="https://www.felixcloutier.com/x86/INVLPG.html"><code>invlpg</code></a> (“invalidate page”) that removes the translation for the specified page from the TLB, so that it is loaded again from the page table on the next access. The TLB can also be flushed completely by reloading the <code>CR3</code> register, which simulates an address space switch. The <code>x86_64</code> crate provides Rust functions for both variants in the <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/tlb/index.html"><code>tlb</code> module</a>.</p> <p>It is important to remember to flush the TLB on each page table modification because otherwise, the CPU might keep using the old translation, which can lead to non-deterministic bugs that are very hard to debug.</p> <h2 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h2> <p>One thing that we did not mention yet: <strong>Our kernel already runs on paging</strong>. The bootloader that we added in the <a href="https://os.phil-opp.com/minimal-rust-kernel/#creating-a-bootimage">“A minimal Rust Kernel”</a> post has already set up a 4-level paging hierarchy that maps every page of our kernel to a physical frame. The bootloader does this because paging is mandatory in 64-bit mode on x86_64.</p> <p>This means that every memory address that we used in our kernel was a virtual address. Accessing the VGA buffer at address <code>0xb8000</code> only worked because the bootloader <em>identity mapped</em> that memory page, which means that it mapped the virtual page <code>0xb8000</code> to the physical frame <code>0xb8000</code>.</p> <p>Paging makes our kernel already relatively safe, since every memory access that is out of bounds causes a page fault exception instead of writing to random physical memory. The bootloader even sets the correct access permissions for each page, which means that only the pages containing code are executable and only data pages are writable.</p> <h3 id="page-faults"><a class="zola-anchor" href="#page-faults" aria-label="Anchor link for: page-faults">🔗</a>Page Faults</h3> <p>Let’s try to cause a page fault by accessing some memory outside of our kernel. First, we create a page fault handler and register it in our IDT, so that we see a page fault exception instead of a generic <a href="https://os.phil-opp.com/double-fault-exceptions/">double fault</a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> </span><span> […] </span><span> </span><span> idt.page_fault.set_handler_fn(page_fault_handler); </span><span style="color:#608b4e;">// new </span><span> </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::idt::PageFaultErrorCode; </span><span style="color:#569cd6;">use crate</span><span>::hlt_loop; </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>page_fault_handler( </span><span> stack_frame: InterruptStackFrame, </span><span> error_code: PageFaultErrorCode, </span><span>) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control::Cr2; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;EXCEPTION: PAGE FAULT&quot;</span><span>); </span><span> println!(</span><span style="color:#d69d85;">&quot;Accessed Address: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, Cr2::read()); </span><span> println!(</span><span style="color:#d69d85;">&quot;Error Code: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, error_code); </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, stack_frame); </span><span> hlt_loop(); </span><span>} </span></code></pre> <p>The <a href="https://en.wikipedia.org/wiki/Control_register#CR2"><code>CR2</code></a> register is automatically set by the CPU on a page fault and contains the accessed virtual address that caused the page fault. We use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr2.html#method.read"><code>Cr2::read</code></a> function of the <code>x86_64</code> crate to read and print it. The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html"><code>PageFaultErrorCode</code></a> type provides more information about the type of memory access that caused the page fault, for example, whether it was caused by a read or write operation. For this reason, we print it too. We can’t continue execution without resolving the page fault, so we enter a <a href="https://os.phil-opp.com/hardware-interrupts/#the-hlt-instruction"><code>hlt_loop</code></a> at the end.</p> <p>Now we can try to access some memory outside our kernel:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> blog_os::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">let</span><span> ptr = </span><span style="color:#b5cea8;">0xdeadbeaf </span><span style="color:#569cd6;">as *mut u8</span><span>; </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ *ptr = </span><span style="color:#b5cea8;">42</span><span>; } </span><span> </span><span> </span><span style="color:#608b4e;">// as before </span><span> #[cfg(test)] </span><span> test_main(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span>} </span></code></pre> <p>When we run it, we see that our page fault handler is called:</p> <p><img src="https://os.phil-opp.com/paging-introduction/qemu-page-fault.png" alt="EXCEPTION: Page Fault, Accessed Address: VirtAddr(0xdeadbeaf), Error Code: CAUSED_BY_WRITE, InterruptStackFrame: {…}" /></p> <p>The <code>CR2</code> register indeed contains <code>0xdeadbeaf</code>, the address that we tried to access. The error code tells us through the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE"><code>CAUSED_BY_WRITE</code></a> that the fault occurred while trying to perform a write operation. It tells us even more through the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html">bits that are <em>not</em> set</a>. For example, the fact that the <code>PROTECTION_VIOLATION</code> flag is not set means that the page fault occurred because the target page wasn’t present.</p> <p>We see that the current instruction pointer is <code>0x2031b2</code>, so we know that this address points to a code page. Code pages are mapped read-only by the bootloader, so reading from this address works but writing causes a page fault. You can try this by changing the <code>0xdeadbeaf</code> pointer to <code>0x2031b2</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// Note: The actual address might be different for you. Use the address that </span><span style="color:#608b4e;">// your page fault handler reports. </span><span style="color:#569cd6;">let</span><span> ptr = </span><span style="color:#b5cea8;">0x2031b2 </span><span style="color:#569cd6;">as *mut u8</span><span>; </span><span> </span><span style="color:#608b4e;">// read from a code page </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">let</span><span> x = *ptr; } </span><span>println!(</span><span style="color:#d69d85;">&quot;read worked&quot;</span><span>); </span><span> </span><span style="color:#608b4e;">// write to a code page </span><span style="color:#569cd6;">unsafe </span><span>{ *ptr = </span><span style="color:#b5cea8;">42</span><span>; } </span><span>println!(</span><span style="color:#d69d85;">&quot;write worked&quot;</span><span>); </span></code></pre> <p>By commenting out the last line, we see that the read access works, but the write access causes a page fault:</p> <p><img src="https://os.phil-opp.com/paging-introduction/qemu-page-fault-protection.png" alt="QEMU with output: “read worked, EXCEPTION: Page Fault, Accessed Address: VirtAddr(0x2031b2), Error Code: PROTECTION_VIOLATION | CAUSED_BY_WRITE, InterruptStackFrame: {…}”" /></p> <p>We see that the <em>“read worked”</em> message is printed, which indicates that the read operation did not cause any errors. However, instead of the <em>“write worked”</em> message, a page fault occurs. This time the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.PROTECTION_VIOLATION"><code>PROTECTION_VIOLATION</code></a> flag is set in addition to the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE"><code>CAUSED_BY_WRITE</code></a> flag, which indicates that the page was present, but the operation was not allowed on it. In this case, writes to the page are not allowed since code pages are mapped as read-only.</p> <h3 id="accessing-the-page-tables"><a class="zola-anchor" href="#accessing-the-page-tables" aria-label="Anchor link for: accessing-the-page-tables">🔗</a>Accessing the Page Tables</h3> <p>Let’s try to take a look at the page tables that define how our kernel is mapped:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> blog_os::init(); </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control::Cr3; </span><span> </span><span> </span><span style="color:#569cd6;">let </span><span>(level_4_page_table, </span><span style="color:#569cd6;">_</span><span>) = Cr3::read(); </span><span> println!(</span><span style="color:#d69d85;">&quot;Level 4 page table at: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, level_4_page_table.start_address()); </span><span> </span><span> […] </span><span style="color:#608b4e;">// test_main(), println(…), and hlt_loop() </span><span>} </span></code></pre> <p>The <a href="https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3.html#method.read"><code>Cr3::read</code></a> function of the <code>x86_64</code> returns the currently active level 4 page table from the <code>CR3</code> register. It returns a tuple of a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/frame/struct.PhysFrame.html"><code>PhysFrame</code></a> and a <a href="https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3Flags.html"><code>Cr3Flags</code></a> type. We are only interested in the frame, so we ignore the second element of the tuple.</p> <p>When we run it, we see the following output:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>Level 4 page table at: PhysAddr(0x1000) </span></code></pre> <p>So the currently active level 4 page table is stored at address <code>0x1000</code> in <em>physical</em> memory, as indicated by the <a href="https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.PhysAddr.html"><code>PhysAddr</code></a> wrapper type. The question now is: how can we access this table from our kernel?</p> <p>Accessing physical memory directly is not possible when paging is active, since programs could easily circumvent memory protection and access the memory of other programs otherwise. So the only way to access the table is through some virtual page that is mapped to the physical frame at address <code>0x1000</code>. This problem of creating mappings for page table frames is a general problem since the kernel needs to access the page tables regularly, for example, when allocating a stack for a new thread.</p> <p>Solutions to this problem are explained in detail in the next post.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>This post introduced two memory protection techniques: segmentation and paging. While the former uses variable-sized memory regions and suffers from external fragmentation, the latter uses fixed-sized pages and allows much more fine-grained control over access permissions.</p> <p>Paging stores the mapping information for pages in page tables with one or more levels. The x86_64 architecture uses 4-level page tables and a page size of 4 KiB. The hardware automatically walks the page tables and caches the resulting translations in the translation lookaside buffer (TLB). This buffer is not updated transparently and needs to be flushed manually on page table changes.</p> <p>We learned that our kernel already runs on top of paging and that illegal memory accesses cause page fault exceptions. We tried to access the currently active page tables, but we weren’t able to do it because the CR3 register stores a physical address that we can’t access directly from our kernel.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>The next post explains how to implement support for paging in our kernel. It presents different ways to access physical memory from our kernel, which makes it possible to access the page tables that our kernel runs on. At this point, we are able to implement functions for translating virtual to physical addresses and for creating new mappings in the page tables.</p> Hardware Interrupts Mon, 22 Oct 2018 00:00:00 +0000 https://os.phil-opp.com/hardware-interrupts/ https://os.phil-opp.com/hardware-interrupts/ <p>In this post, we set up the programmable interrupt controller to correctly forward hardware interrupts to the CPU. To handle these interrupts, we add new entries to our interrupt descriptor table, just like we did for our exception handlers. We will learn how to get periodic timer interrupts and how to get input from the keyboard.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/hardware-interrupts/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-07"><code>post-07</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="overview"><a class="zola-anchor" href="#overview" aria-label="Anchor link for: overview">🔗</a>Overview</h2> <p>Interrupts provide a way to notify the CPU from attached hardware devices. So instead of letting the kernel periodically check the keyboard for new characters (a process called <a href="https://en.wikipedia.org/wiki/Polling_(computer_science)"><em>polling</em></a>), the keyboard can notify the kernel of each keypress. This is much more efficient because the kernel only needs to act when something happened. It also allows faster reaction times since the kernel can react immediately and not only at the next poll.</p> <p>Connecting all hardware devices directly to the CPU is not possible. Instead, a separate <em>interrupt controller</em> aggregates the interrupts from all devices and then notifies the CPU:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span> ____________ _____ </span><span> Timer ------------&gt; | | | | </span><span> Keyboard ---------&gt; | Interrupt |---------&gt; | CPU | </span><span> Other Hardware ---&gt; | Controller | |_____| </span><span> Etc. -------------&gt; |____________| </span><span> </span></code></pre> <p>Most interrupt controllers are programmable, which means they support different priority levels for interrupts. For example, this allows to give timer interrupts a higher priority than keyboard interrupts to ensure accurate timekeeping.</p> <p>Unlike exceptions, hardware interrupts occur <em>asynchronously</em>. This means they are completely independent from the executed code and can occur at any time. Thus, we suddenly have a form of concurrency in our kernel with all the potential concurrency-related bugs. Rust’s strict ownership model helps us here because it forbids mutable global state. However, deadlocks are still possible, as we will see later in this post.</p> <h2 id="the-8259-pic"><a class="zola-anchor" href="#the-8259-pic" aria-label="Anchor link for: the-8259-pic">🔗</a>The 8259 PIC</h2> <p>The <a href="https://en.wikipedia.org/wiki/Intel_8259">Intel 8259</a> is a programmable interrupt controller (PIC) introduced in 1976. It has long been replaced by the newer <a href="https://en.wikipedia.org/wiki/Intel_APIC_Architecture">APIC</a>, but its interface is still supported on current systems for backwards compatibility reasons. The 8259 PIC is significantly easier to set up than the APIC, so we will use it to introduce ourselves to interrupts before we switch to the APIC in a later post.</p> <p>The 8259 has eight interrupt lines and several lines for communicating with the CPU. The typical systems back then were equipped with two instances of the 8259 PIC, one primary and one secondary PIC, connected to one of the interrupt lines of the primary:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span> ____________ ____________ </span><span>Real Time Clock --&gt; | | Timer -------------&gt; | | </span><span>ACPI -------------&gt; | | Keyboard-----------&gt; | | _____ </span><span>Available --------&gt; | Secondary |----------------------&gt; | Primary | | | </span><span>Available --------&gt; | Interrupt | Serial Port 2 -----&gt; | Interrupt |---&gt; | CPU | </span><span>Mouse ------------&gt; | Controller | Serial Port 1 -----&gt; | Controller | |_____| </span><span>Co-Processor -----&gt; | | Parallel Port 2/3 -&gt; | | </span><span>Primary ATA ------&gt; | | Floppy disk -------&gt; | | </span><span>Secondary ATA ----&gt; |____________| Parallel Port 1----&gt; |____________| </span><span> </span></code></pre> <p>This graphic shows the typical assignment of interrupt lines. We see that most of the 15 lines have a fixed mapping, e.g., line 4 of the secondary PIC is assigned to the mouse.</p> <p>Each controller can be configured through two <a href="https://os.phil-opp.com/testing/#i-o-ports">I/O ports</a>, one “command” port and one “data” port. For the primary controller, these ports are <code>0x20</code> (command) and <code>0x21</code> (data). For the secondary controller, they are <code>0xa0</code> (command) and <code>0xa1</code> (data). For more information on how the PICs can be configured, see the <a href="https://wiki.osdev.org/8259_PIC">article on osdev.org</a>.</p> <h3 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h3> <p>The default configuration of the PICs is not usable because it sends interrupt vector numbers in the range of 0–15 to the CPU. These numbers are already occupied by CPU exceptions. For example, number 8 corresponds to a double fault. To fix this overlapping issue, we need to remap the PIC interrupts to different numbers. The actual range doesn’t matter as long as it does not overlap with the exceptions, but typically the range of 32–47 is chosen, because these are the first free numbers after the 32 exception slots.</p> <p>The configuration happens by writing special values to the command and data ports of the PICs. Fortunately, there is already a crate called <a href="https://docs.rs/pic8259/0.10.1/pic8259/"><code>pic8259</code></a>, so we don’t need to write the initialization sequence ourselves. However, if you are interested in how it works, check out <a href="https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs">its source code</a>. It’s fairly small and well documented.</p> <p>To add the crate as a dependency, we add the following to our project:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">pic8259 </span><span>= </span><span style="color:#d69d85;">&quot;0.10.1&quot; </span></code></pre> <p>The main abstraction provided by the crate is the <a href="https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html"><code>ChainedPics</code></a> struct that represents the primary/secondary PIC layout we saw above. It is designed to be used in the following way:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>pic8259::ChainedPics; </span><span style="color:#569cd6;">use</span><span> spin; </span><span> </span><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">PIC_1_OFFSET</span><span>: </span><span style="color:#569cd6;">u8 </span><span>= </span><span style="color:#b5cea8;">32</span><span>; </span><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">PIC_2_OFFSET</span><span>: </span><span style="color:#569cd6;">u8 </span><span>= </span><span style="color:#b4cea8;">PIC_1_OFFSET </span><span>+ </span><span style="color:#b5cea8;">8</span><span>; </span><span> </span><span style="color:#569cd6;">pub static </span><span style="color:#b4cea8;">PICS</span><span>: spin::Mutex&lt;ChainedPics&gt; = </span><span> spin::Mutex::new(</span><span style="color:#569cd6;">unsafe </span><span>{ ChainedPics::new(</span><span style="color:#b4cea8;">PIC_1_OFFSET</span><span>, </span><span style="color:#b4cea8;">PIC_2_OFFSET</span><span>) }); </span></code></pre> <p>As noted above, we’re setting the offsets for the PICs to the range 32–47. By wrapping the <code>ChainedPics</code> struct in a <code>Mutex</code>, we can get safe mutable access (through the <a href="https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock"><code>lock</code> method</a>), which we need in the next step. The <code>ChainedPics::new</code> function is unsafe because wrong offsets could cause undefined behavior.</p> <p>We can now initialize the 8259 PIC in our <code>init</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> gdt::init(); </span><span> interrupts::init_idt(); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ interrupts::</span><span style="color:#b4cea8;">PICS</span><span>.lock().initialize() }; </span><span style="color:#608b4e;">// new </span><span>} </span></code></pre> <p>We use the <a href="https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize"><code>initialize</code></a> function to perform the PIC initialization. Like the <code>ChainedPics::new</code> function, this function is also unsafe because it can cause undefined behavior if the PIC is misconfigured.</p> <p>If all goes well, we should continue to see the “It did not crash” message when executing <code>cargo run</code>.</p> <h2 id="enabling-interrupts"><a class="zola-anchor" href="#enabling-interrupts" aria-label="Anchor link for: enabling-interrupts">🔗</a>Enabling Interrupts</h2> <p>Until now, nothing happened because interrupts are still disabled in the CPU configuration. This means that the CPU does not listen to the interrupt controller at all, so no interrupts can reach the CPU. Let’s change that:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> gdt::init(); </span><span> interrupts::init_idt(); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ interrupts::</span><span style="color:#b4cea8;">PICS</span><span>.lock().initialize() }; </span><span> x86_64::instructions::interrupts::enable(); </span><span style="color:#608b4e;">// new </span><span>} </span></code></pre> <p>The <code>interrupts::enable</code> function of the <code>x86_64</code> crate executes the special <code>sti</code> instruction (“set interrupts”) to enable external interrupts. When we try <code>cargo run</code> now, we see that a double fault occurs:</p> <p><img src="https://os.phil-opp.com/hardware-interrupts/qemu-hardware-timer-double-fault.png" alt="QEMU printing EXCEPTION: DOUBLE FAULT because of hardware timer" /></p> <p>The reason for this double fault is that the hardware timer (the <a href="https://en.wikipedia.org/wiki/Intel_8253">Intel 8253</a>, to be exact) is enabled by default, so we start receiving timer interrupts as soon as we enable interrupts. Since we didn’t define a handler function for it yet, our double fault handler is invoked.</p> <h2 id="handling-timer-interrupts"><a class="zola-anchor" href="#handling-timer-interrupts" aria-label="Anchor link for: handling-timer-interrupts">🔗</a>Handling Timer Interrupts</h2> <p>As we see from the graphic <a href="https://os.phil-opp.com/hardware-interrupts/#the-8259-pic">above</a>, the timer uses line 0 of the primary PIC. This means that it arrives at the CPU as interrupt 32 (0 + offset 32). Instead of hardcoding index 32, we store it in an <code>InterruptIndex</code> enum:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span>#[derive(Debug, Clone, Copy)] </span><span>#[repr(u8)] </span><span style="color:#569cd6;">pub enum </span><span>InterruptIndex { </span><span> Timer = </span><span style="color:#b4cea8;">PIC_1_OFFSET</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>InterruptIndex { </span><span> </span><span style="color:#569cd6;">fn </span><span>as_u8(self) -&gt; </span><span style="color:#569cd6;">u8 </span><span>{ </span><span> self </span><span style="color:#569cd6;">as u8 </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>as_usize(self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> </span><span style="color:#569cd6;">usize</span><span>::from(self.as_u8()) </span><span> } </span><span>} </span></code></pre> <p>The enum is a <a href="https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations">C-like enum</a> so that we can directly specify the index for each variant. The <code>repr(u8)</code> attribute specifies that each variant is represented as a <code>u8</code>. We will add more variants for other interrupts in the future.</p> <p>Now we can add a handler function for the timer interrupt:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use crate</span><span>::print; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> […] </span><span> idt[InterruptIndex::Timer.as_usize()] </span><span> .set_handler_fn(timer_interrupt_handler); </span><span style="color:#608b4e;">// new </span><span> </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>timer_interrupt_handler( </span><span> _stack_frame: InterruptStackFrame) </span><span>{ </span><span> print!(</span><span style="color:#d69d85;">&quot;.&quot;</span><span>); </span><span>} </span></code></pre> <p>Our <code>timer_interrupt_handler</code> has the same signature as our exception handlers, because the CPU reacts identically to exceptions and external interrupts (the only difference is that some exceptions push an error code). The <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html"><code>InterruptDescriptorTable</code></a> struct implements the <a href="https://doc.rust-lang.org/core/ops/trait.IndexMut.html"><code>IndexMut</code></a> trait, so we can access individual entries through array indexing syntax.</p> <p>In our timer interrupt handler, we print a dot to the screen. As the timer interrupt happens periodically, we would expect to see a dot appearing on each timer tick. However, when we run it, we see that only a single dot is printed:</p> <p><img src="https://os.phil-opp.com/hardware-interrupts/qemu-single-dot-printed.png" alt="QEMU printing only a single dot for hardware timer" /></p> <h3 id="end-of-interrupt"><a class="zola-anchor" href="#end-of-interrupt" aria-label="Anchor link for: end-of-interrupt">🔗</a>End of Interrupt</h3> <p>The reason is that the PIC expects an explicit “end of interrupt” (EOI) signal from our interrupt handler. This signal tells the controller that the interrupt was processed and that the system is ready to receive the next interrupt. So the PIC thinks we’re still busy processing the first timer interrupt and waits patiently for the EOI signal before sending the next one.</p> <p>To send the EOI, we use our static <code>PICS</code> struct again:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>timer_interrupt_handler( </span><span> _stack_frame: InterruptStackFrame) </span><span>{ </span><span> print!(</span><span style="color:#d69d85;">&quot;.&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">PICS</span><span>.lock() </span><span> .notify_end_of_interrupt(InterruptIndex::Timer.as_u8()); </span><span> } </span><span>} </span></code></pre> <p>The <code>notify_end_of_interrupt</code> figures out whether the primary or secondary PIC sent the interrupt and then uses the <code>command</code> and <code>data</code> ports to send an EOI signal to the respective controllers. If the secondary PIC sent the interrupt, both PICs need to be notified because the secondary PIC is connected to an input line of the primary PIC.</p> <p>We need to be careful to use the correct interrupt vector number, otherwise we could accidentally delete an important unsent interrupt or cause our system to hang. This is the reason that the function is unsafe.</p> <p>When we now execute <code>cargo run</code> we see dots periodically appearing on the screen:</p> <p><img src="https://os.phil-opp.com/hardware-interrupts/qemu-hardware-timer-dots.gif" alt="QEMU printing consecutive dots showing the hardware timer" /></p> <h3 id="configuring-the-timer"><a class="zola-anchor" href="#configuring-the-timer" aria-label="Anchor link for: configuring-the-timer">🔗</a>Configuring the Timer</h3> <p>The hardware timer that we use is called the <em>Programmable Interval Timer</em>, or PIT, for short. Like the name says, it is possible to configure the interval between two interrupts. We won’t go into details here because we will switch to the <a href="https://wiki.osdev.org/APIC_timer">APIC timer</a> soon, but the OSDev wiki has an extensive article about the <a href="https://wiki.osdev.org/Programmable_Interval_Timer">configuring the PIT</a>.</p> <h2 id="deadlocks"><a class="zola-anchor" href="#deadlocks" aria-label="Anchor link for: deadlocks">🔗</a>Deadlocks</h2> <p>We now have a form of concurrency in our kernel: The timer interrupts occur asynchronously, so they can interrupt our <code>_start</code> function at any time. Fortunately, Rust’s ownership system prevents many types of concurrency-related bugs at compile time. One notable exception is deadlocks. Deadlocks occur if a thread tries to acquire a lock that will never become free. Thus, the thread hangs indefinitely.</p> <p>We can already provoke a deadlock in our kernel. Remember, our <code>println</code> macro calls the <code>vga_buffer::_print</code> function, which <a href="https://os.phil-opp.com/vga-text-mode/#spinlocks">locks a global <code>WRITER</code></a> using a spinlock:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>[…] </span><span> </span><span>#[doc(hidden)] </span><span style="color:#569cd6;">pub fn </span><span>_print(args: fmt::Arguments) { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#b4cea8;">WRITER</span><span>.lock().write_fmt(args).unwrap(); </span><span>} </span></code></pre> <p>It locks the <code>WRITER</code>, calls <code>write_fmt</code> on it, and implicitly unlocks it at the end of the function. Now imagine that an interrupt occurs while the <code>WRITER</code> is locked and the interrupt handler tries to print something too:</p> <table><thead><tr><th>Timestep</th><th>_start</th><th>interrupt_handler</th></tr></thead><tbody> <tr><td>0</td><td>calls <code>println!</code></td><td> </td></tr> <tr><td>1</td><td><code>print</code> locks <code>WRITER</code></td><td> </td></tr> <tr><td>2</td><td></td><td><strong>interrupt occurs</strong>, handler begins to run</td></tr> <tr><td>3</td><td></td><td>calls <code>println!</code></td></tr> <tr><td>4</td><td></td><td><code>print</code> tries to lock <code>WRITER</code> (already locked)</td></tr> <tr><td>5</td><td></td><td><code>print</code> tries to lock <code>WRITER</code> (already locked)</td></tr> <tr><td>…</td><td></td><td>…</td></tr> <tr><td><em>never</em></td><td><em>unlock <code>WRITER</code></em></td><td></td></tr> </tbody></table> <p>The <code>WRITER</code> is locked, so the interrupt handler waits until it becomes free. But this never happens, because the <code>_start</code> function only continues to run after the interrupt handler returns. Thus, the entire system hangs.</p> <h3 id="provoking-a-deadlock"><a class="zola-anchor" href="#provoking-a-deadlock" aria-label="Anchor link for: provoking-a-deadlock">🔗</a>Provoking a Deadlock</h3> <p>We can easily provoke such a deadlock in our kernel by printing something in the loop at the end of our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span> </span><span style="color:#569cd6;">loop </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::print; </span><span> print!(</span><span style="color:#d69d85;">&quot;-&quot;</span><span>); </span><span style="color:#608b4e;">// new </span><span> } </span><span>} </span></code></pre> <p>When we run it in QEMU, we get an output of the form:</p> <p><img src="https://os.phil-opp.com/hardware-interrupts/./qemu-deadlock.png" alt="QEMU output with many rows of hyphens and no dots" /></p> <p>We see that only a limited number of hyphens are printed until the first timer interrupt occurs. Then the system hangs because the timer interrupt handler deadlocks when it tries to print a dot. This is the reason that we see no dots in the above output.</p> <p>The actual number of hyphens varies between runs because the timer interrupt occurs asynchronously. This non-determinism is what makes concurrency-related bugs so difficult to debug.</p> <h3 id="fixing-the-deadlock"><a class="zola-anchor" href="#fixing-the-deadlock" aria-label="Anchor link for: fixing-the-deadlock">🔗</a>Fixing the Deadlock</h3> <p>To avoid this deadlock, we can disable interrupts as long as the <code>Mutex</code> is locked:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#608b4e;">/// Prints the given formatted string to the VGA text buffer </span><span style="color:#608b4e;">/// through the global `WRITER` instance. </span><span>#[doc(hidden)] </span><span style="color:#569cd6;">pub fn </span><span>_print(args: fmt::Arguments) { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::interrupts; </span><span style="color:#608b4e;">// new </span><span> </span><span> interrupts::without_interrupts(|| { </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#b4cea8;">WRITER</span><span>.lock().write_fmt(args).unwrap(); </span><span> }); </span><span>} </span></code></pre> <p>The <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html"><code>without_interrupts</code></a> function takes a <a href="https://doc.rust-lang.org/book/ch13-01-closures.html">closure</a> and executes it in an interrupt-free environment. We use it to ensure that no interrupt can occur as long as the <code>Mutex</code> is locked. When we run our kernel now, we see that it keeps running without hanging. (We still don’t notice any dots, but this is because they’re scrolling by too fast. Try to slow down the printing, e.g., by putting a <code>for _ in 0..10000 {}</code> inside the loop.)</p> <p>We can apply the same change to our serial printing function to ensure that no deadlocks occur with it either:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/serial.rs </span><span> </span><span>#[doc(hidden)] </span><span style="color:#569cd6;">pub fn </span><span>_print(args: ::core::fmt::Arguments) { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::interrupts; </span><span style="color:#608b4e;">// new </span><span> </span><span> interrupts::without_interrupts(|| { </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#b4cea8;">SERIAL1 </span><span> .lock() </span><span> .write_fmt(args) </span><span> .expect(</span><span style="color:#d69d85;">&quot;Printing to serial failed&quot;</span><span>); </span><span> }); </span><span>} </span></code></pre> <p>Note that disabling interrupts shouldn’t be a general solution. The problem is that it increases the worst-case interrupt latency, i.e., the time until the system reacts to an interrupt. Therefore, interrupts should only be disabled for a very short time.</p> <h2 id="fixing-a-race-condition"><a class="zola-anchor" href="#fixing-a-race-condition" aria-label="Anchor link for: fixing-a-race-condition">🔗</a>Fixing a Race Condition</h2> <p>If you run <code>cargo test</code>, you might see the <code>test_println_output</code> test failing:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test --lib </span><span>[…] </span><span>Running 4 tests </span><span>test_breakpoint_exception...[ok] </span><span>test_println... [ok] </span><span>test_println_many... [ok] </span><span>test_println_output... [failed] </span><span> </span><span>Error: panicked at &#39;assertion failed: `(left == right)` </span><span> left: `&#39;.&#39;`, </span><span> right: `&#39;S&#39;`&#39;, src/vga_buffer.rs:205:9 </span></code></pre> <p>The reason is a <em>race condition</em> between the test and our timer handler. Remember, the test looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>test_println_output() { </span><span> </span><span style="color:#569cd6;">let</span><span> s = </span><span style="color:#d69d85;">&quot;Some test string that fits on a single line&quot;</span><span>; </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, s); </span><span> </span><span style="color:#569cd6;">for </span><span>(i, c) </span><span style="color:#569cd6;">in</span><span> s.chars().enumerate() { </span><span> </span><span style="color:#569cd6;">let</span><span> screen_char = </span><span style="color:#b4cea8;">WRITER</span><span>.lock().buffer.chars[</span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">2</span><span>][i].read(); </span><span> assert_eq!(</span><span style="color:#569cd6;">char</span><span>::from(screen_char.ascii_character), c); </span><span> } </span><span>} </span></code></pre> <p>The test prints a string to the VGA buffer and then checks the output by manually iterating over the <code>buffer_chars</code> array. The race condition occurs because the timer interrupt handler might run between the <code>println</code> and the reading of the screen characters. Note that this isn’t a dangerous <em>data race</em>, which Rust completely prevents at compile time. See the <a href="https://doc.rust-lang.org/nomicon/races.html"><em>Rustonomicon</em></a> for details.</p> <p>To fix this, we need to keep the <code>WRITER</code> locked for the complete duration of the test, so that the timer handler can’t write a <code>.</code> to the screen in between. The fixed test looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>test_println_output() { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::interrupts; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> s = </span><span style="color:#d69d85;">&quot;Some test string that fits on a single line&quot;</span><span>; </span><span> interrupts::without_interrupts(|| { </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = </span><span style="color:#b4cea8;">WRITER</span><span>.lock(); </span><span> writeln!(writer, </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, s).expect(</span><span style="color:#d69d85;">&quot;writeln failed&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">for </span><span>(i, c) </span><span style="color:#569cd6;">in</span><span> s.chars().enumerate() { </span><span> </span><span style="color:#569cd6;">let</span><span> screen_char = writer.buffer.chars[</span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">2</span><span>][i].read(); </span><span> assert_eq!(</span><span style="color:#569cd6;">char</span><span>::from(screen_char.ascii_character), c); </span><span> } </span><span> }); </span><span>} </span></code></pre> <p>We performed the following changes:</p> <ul> <li>We keep the writer locked for the complete test by using the <code>lock()</code> method explicitly. Instead of <code>println</code>, we use the <a href="https://doc.rust-lang.org/core/macro.writeln.html"><code>writeln</code></a> macro that allows printing to an already locked writer.</li> <li>To avoid another deadlock, we disable interrupts for the test’s duration. Otherwise, the test might get interrupted while the writer is still locked.</li> <li>Since the timer interrupt handler can still run before the test, we print an additional newline <code>\n</code> before printing the string <code>s</code>. This way, we avoid test failure when the timer handler has already printed some <code>.</code> characters to the current line.</li> </ul> <p>With the above changes, <code>cargo test</code> now deterministically succeeds again.</p> <p>This was a very harmless race condition that only caused a test failure. As you can imagine, other race conditions can be much more difficult to debug due to their non-deterministic nature. Luckily, Rust prevents us from data races, which are the most serious class of race conditions since they can cause all kinds of undefined behavior, including system crashes and silent memory corruptions.</p> <h2 id="the-hlt-instruction"><a class="zola-anchor" href="#the-hlt-instruction" aria-label="Anchor link for: the-hlt-instruction">🔗</a>The <code>hlt</code> Instruction</h2> <p>Until now, we used a simple empty loop statement at the end of our <code>_start</code> and <code>panic</code> functions. This causes the CPU to spin endlessly, and thus works as expected. But it is also very inefficient, because the CPU continues to run at full speed even though there’s no work to do. You can see this problem in your task manager when you run your kernel: The QEMU process needs close to 100% CPU the whole time.</p> <p>What we really want to do is to halt the CPU until the next interrupt arrives. This allows the CPU to enter a sleep state in which it consumes much less energy. The <a href="https://en.wikipedia.org/wiki/HLT_(x86_instruction)"><code>hlt</code> instruction</a> does exactly that. Let’s use this instruction to create an energy-efficient endless loop:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>hlt_loop() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{ </span><span> x86_64::instructions::hlt(); </span><span> } </span><span>} </span></code></pre> <p>The <code>instructions::hlt</code> function is just a <a href="https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22">thin wrapper</a> around the assembly instruction. It is safe because there’s no way it can compromise memory safety.</p> <p>We can now use this <code>hlt_loop</code> instead of the endless loops in our <code>_start</code> and <code>panic</code> functions:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> […] </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> blog_os::hlt_loop(); </span><span style="color:#608b4e;">// new </span><span>} </span><span> </span><span> </span><span>#[cfg(not(test))] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> blog_os::hlt_loop(); </span><span style="color:#608b4e;">// new </span><span>} </span><span> </span></code></pre> <p>Let’s update our <code>lib.rs</code> as well:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#608b4e;">/// Entry point for `cargo test` </span><span>#[cfg(test)] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> init(); </span><span> test_main(); </span><span> hlt_loop(); </span><span style="color:#608b4e;">// new </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>test_panic_handler(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[failed]</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>); </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Error: {}</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> exit_qemu(QemuExitCode::Failed); </span><span> hlt_loop(); </span><span style="color:#608b4e;">// new </span><span>} </span></code></pre> <p>When we run our kernel now in QEMU, we see a much lower CPU usage.</p> <h2 id="keyboard-input"><a class="zola-anchor" href="#keyboard-input" aria-label="Anchor link for: keyboard-input">🔗</a>Keyboard Input</h2> <p>Now that we are able to handle interrupts from external devices, we are finally able to add support for keyboard input. This will allow us to interact with our kernel for the first time.</p> <aside class="post_aside"> <p>Note that we only describe how to handle <a href="https://en.wikipedia.org/wiki/PS/2_port">PS/2</a> keyboards here, not USB keyboards. However, the mainboard emulates USB keyboards as PS/2 devices to support older software, so we can safely ignore USB keyboards until we have USB support in our kernel.</p> </aside> <p>Like the hardware timer, the keyboard controller is already enabled by default. So when you press a key, the keyboard controller sends an interrupt to the PIC, which forwards it to the CPU. The CPU looks for a handler function in the IDT, but the corresponding entry is empty. Therefore, a double fault occurs.</p> <p>So let’s add a handler function for the keyboard interrupt. It’s quite similar to how we defined the handler for the timer interrupt; it just uses a different interrupt number:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span>#[derive(Debug, Clone, Copy)] </span><span>#[repr(u8)] </span><span style="color:#569cd6;">pub enum </span><span>InterruptIndex { </span><span> Timer = </span><span style="color:#b4cea8;">PIC_1_OFFSET</span><span>, </span><span> Keyboard, </span><span style="color:#608b4e;">// new </span><span>} </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> […] </span><span> </span><span style="color:#608b4e;">// new </span><span> idt[InterruptIndex::Keyboard.as_usize()] </span><span> .set_handler_fn(keyboard_interrupt_handler); </span><span> </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>keyboard_interrupt_handler( </span><span> _stack_frame: InterruptStackFrame) </span><span>{ </span><span> print!(</span><span style="color:#d69d85;">&quot;k&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">PICS</span><span>.lock() </span><span> .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); </span><span> } </span><span>} </span></code></pre> <p>As we see from the graphic <a href="https://os.phil-opp.com/hardware-interrupts/#the-8259-pic">above</a>, the keyboard uses line 1 of the primary PIC. This means that it arrives at the CPU as interrupt 33 (1 + offset 32). We add this index as a new <code>Keyboard</code> variant to the <code>InterruptIndex</code> enum. We don’t need to specify the value explicitly, since it defaults to the previous value plus one, which is also 33. In the interrupt handler, we print a <code>k</code> and send the end of interrupt signal to the interrupt controller.</p> <p>We now see that a <code>k</code> appears on the screen when we press a key. However, this only works for the first key we press. Even if we continue to press keys, no more <code>k</code>s appear on the screen. This is because the keyboard controller won’t send another interrupt until we have read the so-called <em>scancode</em> of the pressed key.</p> <h3 id="reading-the-scancodes"><a class="zola-anchor" href="#reading-the-scancodes" aria-label="Anchor link for: reading-the-scancodes">🔗</a>Reading the Scancodes</h3> <p>To find out <em>which</em> key was pressed, we need to query the keyboard controller. We do this by reading from the data port of the PS/2 controller, which is the <a href="https://os.phil-opp.com/testing/#i-o-ports">I/O port</a> with the number <code>0x60</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>keyboard_interrupt_handler( </span><span> _stack_frame: InterruptStackFrame) </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::new(</span><span style="color:#b5cea8;">0x60</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> scancode: </span><span style="color:#569cd6;">u8 </span><span>= </span><span style="color:#569cd6;">unsafe </span><span>{ port.read() }; </span><span> print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, scancode); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">PICS</span><span>.lock() </span><span> .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); </span><span> } </span><span>} </span></code></pre> <p>We use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html"><code>Port</code></a> type of the <code>x86_64</code> crate to read a byte from the keyboard’s data port. This byte is called the <a href="https://en.wikipedia.org/wiki/Scancode"><em>scancode</em></a> and it represents the key press/release. We don’t do anything with the scancode yet, other than print it to the screen:</p> <p><img src="https://os.phil-opp.com/hardware-interrupts/qemu-printing-scancodes.gif" alt="QEMU printing scancodes to the screen when keys are pressed" /></p> <p>The above image shows me slowly typing “123”. We see that adjacent keys have adjacent scancodes and that pressing a key causes a different scancode than releasing it. But how do we translate the scancodes to the actual key actions exactly?</p> <h3 id="interpreting-the-scancodes"><a class="zola-anchor" href="#interpreting-the-scancodes" aria-label="Anchor link for: interpreting-the-scancodes">🔗</a>Interpreting the Scancodes</h3> <p>There are three different standards for the mapping between scancodes and keys, the so-called <em>scancode sets</em>. All three go back to the keyboards of early IBM computers: the <a href="https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT">IBM XT</a>, the <a href="https://en.wikipedia.org/wiki/IBM_3270_PC">IBM 3270 PC</a>, and the <a href="https://en.wikipedia.org/wiki/IBM_Personal_Computer/AT">IBM AT</a>. Later computers fortunately did not continue the trend of defining new scancode sets, but rather emulated the existing sets and extended them. Today, most keyboards can be configured to emulate any of the three sets.</p> <p>By default, PS/2 keyboards emulate scancode set 1 (“XT”). In this set, the lower 7 bits of a scancode byte define the key, and the most significant bit defines whether it’s a press (“0”) or a release (“1”). Keys that were not present on the original <a href="https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT">IBM XT</a> keyboard, such as the enter key on the keypad, generate two scancodes in succession: a <code>0xe0</code> escape byte and then a byte representing the key. For a list of all set 1 scancodes and their corresponding keys, check out the <a href="https://wiki.osdev.org/Keyboard#Scan_Code_Set_1">OSDev Wiki</a>.</p> <p>To translate the scancodes to keys, we can use a <code>match</code> statement:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>keyboard_interrupt_handler( </span><span> _stack_frame: InterruptStackFrame) </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::new(</span><span style="color:#b5cea8;">0x60</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> scancode: </span><span style="color:#569cd6;">u8 </span><span>= </span><span style="color:#569cd6;">unsafe </span><span>{ port.read() }; </span><span> </span><span> </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">let</span><span> key = </span><span style="color:#569cd6;">match</span><span> scancode { </span><span> </span><span style="color:#b5cea8;">0x02 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;1&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x03 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;2&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x04 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;3&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x05 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;4&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x06 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;5&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x07 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;6&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x08 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;7&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x09 </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;8&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x0a </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;9&#39;</span><span>), </span><span> </span><span style="color:#b5cea8;">0x0b </span><span style="color:#569cd6;">=&gt; </span><span>Some(</span><span style="color:#d69d85;">&#39;0&#39;</span><span>), </span><span> </span><span style="color:#569cd6;">_ =&gt; </span><span>None, </span><span> }; </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(key) = key { </span><span> print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, key); </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">PICS</span><span>.lock() </span><span> .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); </span><span> } </span><span>} </span></code></pre> <p>The above code translates keypresses of the number keys 0-9 and ignores all other keys. It uses a <a href="https://doc.rust-lang.org/book/ch06-02-match.html">match</a> statement to assign a character or <code>None</code> to each scancode. It then uses <a href="https://doc.rust-lang.org/book/ch18-01-all-the-places-for-patterns.html#conditional-if-let-expressions"><code>if let</code></a> to destructure the optional <code>key</code>. By using the same variable name <code>key</code> in the pattern, we <a href="https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#shadowing">shadow</a> the previous declaration, which is a common pattern for destructuring <code>Option</code> types in Rust.</p> <p>Now we can write numbers:</p> <p><img src="https://os.phil-opp.com/hardware-interrupts/qemu-printing-numbers.gif" alt="QEMU printing numbers to the screen" /></p> <p>Translating the other keys works in the same way. Fortunately, there is a crate named <a href="https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/"><code>pc-keyboard</code></a> for translating scancodes of scancode sets 1 and 2, so we don’t have to implement this ourselves. To use the crate, we add it to our <code>Cargo.toml</code> and import it in our <code>lib.rs</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">pc-keyboard </span><span>= </span><span style="color:#d69d85;">&quot;0.7.0&quot; </span></code></pre> <p>Now we can use this crate to rewrite our <code>keyboard_interrupt_handler</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in/src/interrupts.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>keyboard_interrupt_handler( </span><span> _stack_frame: InterruptStackFrame) </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; </span><span> </span><span style="color:#569cd6;">use </span><span>spin::Mutex; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">KEYBOARD</span><span>: Mutex&lt;Keyboard&lt;layouts::Us104Key, ScancodeSet1&gt;&gt; = </span><span> Mutex::new(Keyboard::new(ScancodeSet1::new(), </span><span> layouts::Us104Key, HandleControl::Ignore) </span><span> ); </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> keyboard = </span><span style="color:#b4cea8;">KEYBOARD</span><span>.lock(); </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::new(</span><span style="color:#b5cea8;">0x60</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> scancode: </span><span style="color:#569cd6;">u8 </span><span>= </span><span style="color:#569cd6;">unsafe </span><span>{ port.read() }; </span><span> </span><span style="color:#569cd6;">if let </span><span>Ok(Some(key_event)) = keyboard.add_byte(scancode) { </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(key) = keyboard.process_keyevent(key_event) { </span><span> </span><span style="color:#569cd6;">match</span><span> key { </span><span> DecodedKey::Unicode(character) </span><span style="color:#569cd6;">=&gt; </span><span>print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, character), </span><span> DecodedKey::RawKey(key) </span><span style="color:#569cd6;">=&gt; </span><span>print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, key), </span><span> } </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">PICS</span><span>.lock() </span><span> .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); </span><span> } </span><span>} </span></code></pre> <p>We use the <code>lazy_static</code> macro to create a static <a href="https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html"><code>Keyboard</code></a> object protected by a Mutex. We initialize the <code>Keyboard</code> with a US keyboard layout and the scancode set 1. The <a href="https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html"><code>HandleControl</code></a> parameter allows to map <code>ctrl+[a-z]</code> to the Unicode characters <code>U+0001</code> through <code>U+001A</code>. We don’t want to do that, so we use the <code>Ignore</code> option to handle the <code>ctrl</code> like normal keys.</p> <p>On each interrupt, we lock the Mutex, read the scancode from the keyboard controller, and pass it to the <a href="https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.add_byte"><code>add_byte</code></a> method, which translates the scancode into an <code>Option&lt;KeyEvent&gt;</code>. The <a href="https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.KeyEvent.html"><code>KeyEvent</code></a> contains the key which caused the event and whether it was a press or release event.</p> <p>To interpret this key event, we pass it to the <a href="https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.process_keyevent"><code>process_keyevent</code></a> method, which translates the key event to a character, if possible. For example, it translates a press event of the <code>A</code> key to either a lowercase <code>a</code> character or an uppercase <code>A</code> character, depending on whether the shift key was pressed.</p> <p>With this modified interrupt handler, we can now write text:</p> <p><img src="https://os.phil-opp.com/hardware-interrupts/qemu-typing.gif" alt="Typing “Hello World” in QEMU" /></p> <h3 id="configuring-the-keyboard"><a class="zola-anchor" href="#configuring-the-keyboard" aria-label="Anchor link for: configuring-the-keyboard">🔗</a>Configuring the Keyboard</h3> <p>It’s possible to configure some aspects of a PS/2 keyboard, for example, which scancode set it should use. We won’t cover it here because this post is already long enough, but the OSDev Wiki has an overview of possible <a href="https://wiki.osdev.org/PS/2_Keyboard#Commands">configuration commands</a>.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>This post explained how to enable and handle external interrupts. We learned about the 8259 PIC and its primary/secondary layout, the remapping of the interrupt numbers, and the “end of interrupt” signal. We implemented handlers for the hardware timer and the keyboard and learned about the <code>hlt</code> instruction, which halts the CPU until the next interrupt.</p> <p>Now we are able to interact with our kernel and have some fundamental building blocks for creating a small shell or simple games.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>Timer interrupts are essential for an operating system because they provide a way to periodically interrupt the running process and let the kernel regain control. The kernel can then switch to a different process and create the illusion of multiple processes running in parallel.</p> <p>But before we can create processes or threads, we need a way to allocate memory for them. The next posts will explore memory management to provide this fundamental building block.</p> Double Faults Mon, 18 Jun 2018 00:00:00 +0000 https://os.phil-opp.com/double-fault-exceptions/ https://os.phil-opp.com/double-fault-exceptions/ <p>This post explores the double fault exception in detail, which occurs when the CPU fails to invoke an exception handler. By handling this exception, we avoid fatal <em>triple faults</em> that cause a system reset. To prevent triple faults in all cases, we also set up an <em>Interrupt Stack Table</em> to catch double faults on a separate kernel stack.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/double-fault-exceptions/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-06"><code>post-06</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="what-is-a-double-fault"><a class="zola-anchor" href="#what-is-a-double-fault" aria-label="Anchor link for: what-is-a-double-fault">🔗</a>What is a Double Fault?</h2> <p>In simplified terms, a double fault is a special exception that occurs when the CPU fails to invoke an exception handler. For example, it occurs when a page fault is triggered but there is no page fault handler registered in the <a href="https://os.phil-opp.com/cpu-exceptions/#the-interrupt-descriptor-table">Interrupt Descriptor Table</a> (IDT). So it’s kind of similar to catch-all blocks in programming languages with exceptions, e.g., <code>catch(...)</code> in C++ or <code>catch(Exception e)</code> in Java or C#.</p> <p>A double fault behaves like a normal exception. It has the vector number <code>8</code> and we can define a normal handler function for it in the IDT. It is really important to provide a double fault handler, because if a double fault is unhandled, a fatal <em>triple fault</em> occurs. Triple faults can’t be caught, and most hardware reacts with a system reset.</p> <h3 id="triggering-a-double-fault"><a class="zola-anchor" href="#triggering-a-double-fault" aria-label="Anchor link for: triggering-a-double-fault">🔗</a>Triggering a Double Fault</h3> <p>Let’s provoke a double fault by triggering an exception for which we didn’t define a handler function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> blog_os::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// trigger a page fault </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> *(</span><span style="color:#b5cea8;">0xdeadbeef </span><span style="color:#569cd6;">as *mut u8</span><span>) = </span><span style="color:#b5cea8;">42</span><span>; </span><span> }; </span><span> </span><span> </span><span style="color:#608b4e;">// as before </span><span> #[cfg(test)] </span><span> test_main(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We use <code>unsafe</code> to write to the invalid address <code>0xdeadbeef</code>. The virtual address is not mapped to a physical address in the page tables, so a page fault occurs. We haven’t registered a page fault handler in our <a href="https://os.phil-opp.com/cpu-exceptions/#the-interrupt-descriptor-table">IDT</a>, so a double fault occurs.</p> <p>When we start our kernel now, we see that it enters an endless boot loop. The reason for the boot loop is the following:</p> <ol> <li>The CPU tries to write to <code>0xdeadbeef</code>, which causes a page fault.</li> <li>The CPU looks at the corresponding entry in the IDT and sees that no handler function is specified. Thus, it can’t call the page fault handler and a double fault occurs.</li> <li>The CPU looks at the IDT entry of the double fault handler, but this entry does not specify a handler function either. Thus, a <em>triple</em> fault occurs.</li> <li>A triple fault is fatal. QEMU reacts to it like most real hardware and issues a system reset.</li> </ol> <p>So in order to prevent this triple fault, we need to either provide a handler function for page faults or a double fault handler. We want to avoid triple faults in all cases, so let’s start with a double fault handler that is invoked for all unhandled exception types.</p> <h2 id="a-double-fault-handler"><a class="zola-anchor" href="#a-double-fault-handler" aria-label="Anchor link for: a-double-fault-handler">🔗</a>A Double Fault Handler</h2> <p>A double fault is a normal exception with an error code, so we can specify a handler function similar to our breakpoint handler:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> idt.double_fault.set_handler_fn(double_fault_handler); </span><span style="color:#608b4e;">// new </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#608b4e;">// new </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>double_fault_handler( </span><span> stack_frame: InterruptStackFrame, _error_code: </span><span style="color:#569cd6;">u64</span><span>) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> panic!(</span><span style="color:#d69d85;">&quot;EXCEPTION: DOUBLE FAULT</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">{:#?}&quot;</span><span>, stack_frame); </span><span>} </span></code></pre> <p>Our handler prints a short error message and dumps the exception stack frame. The error code of the double fault handler is always zero, so there’s no reason to print it. One difference to the breakpoint handler is that the double fault handler is <a href="https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html"><em>diverging</em></a>. The reason is that the <code>x86_64</code> architecture does not permit returning from a double fault exception.</p> <p>When we start our kernel now, we should see that the double fault handler is invoked:</p> <p><img src="https://os.phil-opp.com/double-fault-exceptions/qemu-catch-double-fault.png" alt="QEMU printing EXCEPTION: DOUBLE FAULT and the exception stack frame" /></p> <p>It worked! Here is what happened this time:</p> <ol> <li>The CPU tries to write to <code>0xdeadbeef</code>, which causes a page fault.</li> <li>Like before, the CPU looks at the corresponding entry in the IDT and sees that no handler function is defined. Thus, a double fault occurs.</li> <li>The CPU jumps to the – now present – double fault handler.</li> </ol> <p>The triple fault (and the boot-loop) no longer occurs, since the CPU can now call the double fault handler.</p> <p>That was quite straightforward! So why do we need a whole post for this topic? Well, we’re now able to catch <em>most</em> double faults, but there are some cases where our current approach doesn’t suffice.</p> <h2 id="causes-of-double-faults"><a class="zola-anchor" href="#causes-of-double-faults" aria-label="Anchor link for: causes-of-double-faults">🔗</a>Causes of Double Faults</h2> <p>Before we look at the special cases, we need to know the exact causes of double faults. Above, we used a pretty vague definition:</p> <blockquote> <p>A double fault is a special exception that occurs when the CPU fails to invoke an exception handler.</p> </blockquote> <p>What does <em>“fails to invoke”</em> mean exactly? The handler is not present? The handler is <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf">swapped out</a>? And what happens if a handler causes exceptions itself?</p> <p>For example, what happens if:</p> <ol> <li>a breakpoint exception occurs, but the corresponding handler function is swapped out?</li> <li>a page fault occurs, but the page fault handler is swapped out?</li> <li>a divide-by-zero handler causes a breakpoint exception, but the breakpoint handler is swapped out?</li> <li>our kernel overflows its stack and the <em>guard page</em> is hit?</li> </ol> <p>Fortunately, the AMD64 manual (<a href="https://www.amd.com/system/files/TechDocs/24593.pdf">PDF</a>) has an exact definition (in Section 8.2.9). According to it, a “double fault exception <em>can</em> occur when a second exception occurs during the handling of a prior (first) exception handler”. The <em>“can”</em> is important: Only very specific combinations of exceptions lead to a double fault. These combinations are:</p> <table><thead><tr><th>First Exception</th><th>Second Exception</th></tr></thead><tbody> <tr><td><a href="https://wiki.osdev.org/Exceptions#Division_Error">Divide-by-zero</a>,<br><a href="https://wiki.osdev.org/Exceptions#Invalid_TSS">Invalid TSS</a>,<br><a href="https://wiki.osdev.org/Exceptions#Segment_Not_Present">Segment Not Present</a>,<br><a href="https://wiki.osdev.org/Exceptions#Stack-Segment_Fault">Stack-Segment Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#General_Protection_Fault">General Protection Fault</a></td><td><a href="https://wiki.osdev.org/Exceptions#Invalid_TSS">Invalid TSS</a>,<br><a href="https://wiki.osdev.org/Exceptions#Segment_Not_Present">Segment Not Present</a>,<br><a href="https://wiki.osdev.org/Exceptions#Stack-Segment_Fault">Stack-Segment Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#General_Protection_Fault">General Protection Fault</a></td></tr> <tr><td><a href="https://wiki.osdev.org/Exceptions#Page_Fault">Page Fault</a></td><td><a href="https://wiki.osdev.org/Exceptions#Page_Fault">Page Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#Invalid_TSS">Invalid TSS</a>,<br><a href="https://wiki.osdev.org/Exceptions#Segment_Not_Present">Segment Not Present</a>,<br><a href="https://wiki.osdev.org/Exceptions#Stack-Segment_Fault">Stack-Segment Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#General_Protection_Fault">General Protection Fault</a></td></tr> </tbody></table> <p>So, for example, a divide-by-zero fault followed by a page fault is fine (the page fault handler is invoked), but a divide-by-zero fault followed by a general-protection fault leads to a double fault.</p> <p>With the help of this table, we can answer the first three of the above questions:</p> <ol> <li>If a breakpoint exception occurs and the corresponding handler function is swapped out, a <em>page fault</em> occurs and the <em>page fault handler</em> is invoked.</li> <li>If a page fault occurs and the page fault handler is swapped out, a <em>double fault</em> occurs and the <em>double fault handler</em> is invoked.</li> <li>If a divide-by-zero handler causes a breakpoint exception, the CPU tries to invoke the breakpoint handler. If the breakpoint handler is swapped out, a <em>page fault</em> occurs and the <em>page fault handler</em> is invoked.</li> </ol> <p>In fact, even the case of an exception without a handler function in the IDT follows this scheme: When the exception occurs, the CPU tries to read the corresponding IDT entry. Since the entry is 0, which is not a valid IDT entry, a <em>general protection fault</em> occurs. We did not define a handler function for the general protection fault either, so another general protection fault occurs. According to the table, this leads to a double fault.</p> <h3 id="kernel-stack-overflow"><a class="zola-anchor" href="#kernel-stack-overflow" aria-label="Anchor link for: kernel-stack-overflow">🔗</a>Kernel Stack Overflow</h3> <p>Let’s look at the fourth question:</p> <blockquote> <p>What happens if our kernel overflows its stack and the guard page is hit?</p> </blockquote> <p>A guard page is a special memory page at the bottom of a stack that makes it possible to detect stack overflows. The page is not mapped to any physical frame, so accessing it causes a page fault instead of silently corrupting other memory. The bootloader sets up a guard page for our kernel stack, so a stack overflow causes a <em>page fault</em>.</p> <p>When a page fault occurs, the CPU looks up the page fault handler in the IDT and tries to push the <a href="https://os.phil-opp.com/cpu-exceptions/#the-interrupt-stack-frame">interrupt stack frame</a> onto the stack. However, the current stack pointer still points to the non-present guard page. Thus, a second page fault occurs, which causes a double fault (according to the above table).</p> <p>So the CPU tries to call the <em>double fault handler</em> now. However, on a double fault, the CPU tries to push the exception stack frame, too. The stack pointer still points to the guard page, so a <em>third</em> page fault occurs, which causes a <em>triple fault</em> and a system reboot. So our current double fault handler can’t avoid a triple fault in this case.</p> <p>Let’s try it ourselves! We can easily provoke a kernel stack overflow by calling a function that recurses endlessly:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#608b4e;">// don&#39;t mangle the name of this function </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> blog_os::init(); </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>stack_overflow() { </span><span> stack_overflow(); </span><span style="color:#608b4e;">// for each recursion, the return address is pushed </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// trigger a stack overflow </span><span> stack_overflow(); </span><span> </span><span> […] </span><span style="color:#608b4e;">// test_main(), println(…), and loop {} </span><span>} </span></code></pre> <p>When we try this code in QEMU, we see that the system enters a bootloop again.</p> <p>So how can we avoid this problem? We can’t omit the pushing of the exception stack frame, since the CPU itself does it. So we need to ensure somehow that the stack is always valid when a double fault exception occurs. Fortunately, the x86_64 architecture has a solution to this problem.</p> <h2 id="switching-stacks"><a class="zola-anchor" href="#switching-stacks" aria-label="Anchor link for: switching-stacks">🔗</a>Switching Stacks</h2> <p>The x86_64 architecture is able to switch to a predefined, known-good stack when an exception occurs. This switch happens at hardware level, so it can be performed before the CPU pushes the exception stack frame.</p> <p>The switching mechanism is implemented as an <em>Interrupt Stack Table</em> (IST). The IST is a table of 7 pointers to known-good stacks. In Rust-like pseudocode:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">struct </span><span>InterruptStackTable { </span><span> stack_pointers: [Option&lt;StackPointer&gt;; 7], </span><span>} </span></code></pre> <p>For each exception handler, we can choose a stack from the IST through the <code>stack_pointers</code> field in the corresponding <a href="https://os.phil-opp.com/cpu-exceptions/#the-interrupt-descriptor-table">IDT entry</a>. For example, our double fault handler could use the first stack in the IST. Then the CPU automatically switches to this stack whenever a double fault occurs. This switch would happen before anything is pushed, preventing the triple fault.</p> <h3 id="the-ist-and-tss"><a class="zola-anchor" href="#the-ist-and-tss" aria-label="Anchor link for: the-ist-and-tss">🔗</a>The IST and TSS</h3> <p>The Interrupt Stack Table (IST) is part of an old legacy structure called <em><a href="https://en.wikipedia.org/wiki/Task_state_segment">Task State Segment</a></em> (TSS). The TSS used to hold various pieces of information (e.g., processor register state) about a task in 32-bit mode and was, for example, used for <a href="https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching">hardware context switching</a>. However, hardware context switching is no longer supported in 64-bit mode and the format of the TSS has changed completely.</p> <p>On x86_64, the TSS no longer holds any task-specific information at all. Instead, it holds two stack tables (the IST is one of them). The only common field between the 32-bit and 64-bit TSS is the pointer to the <a href="https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions">I/O port permissions bitmap</a>.</p> <p>The 64-bit TSS has the following format:</p> <table><thead><tr><th>Field</th><th>Type</th></tr></thead><tbody> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u32</code></td></tr> <tr><td>Privilege Stack Table</td><td><code>[u64; 3]</code></td></tr> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u64</code></td></tr> <tr><td>Interrupt Stack Table</td><td><code>[u64; 7]</code></td></tr> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u64</code></td></tr> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u16</code></td></tr> <tr><td>I/O Map Base Address</td><td><code>u16</code></td></tr> </tbody></table> <p>The <em>Privilege Stack Table</em> is used by the CPU when the privilege level changes. For example, if an exception occurs while the CPU is in user mode (privilege level 3), the CPU normally switches to kernel mode (privilege level 0) before invoking the exception handler. In that case, the CPU would switch to the 0th stack in the Privilege Stack Table (since 0 is the target privilege level). We don’t have any user-mode programs yet, so we will ignore this table for now.</p> <h3 id="creating-a-tss"><a class="zola-anchor" href="#creating-a-tss" aria-label="Anchor link for: creating-a-tss">🔗</a>Creating a TSS</h3> <p>Let’s create a new TSS that contains a separate double fault stack in its interrupt stack table. For that, we need a TSS struct. Fortunately, the <code>x86_64</code> crate already contains a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html"><code>TaskStateSegment</code> struct</a> that we can use.</p> <p>We create the TSS in a new <code>gdt</code> module (the name will make sense later):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>gdt; </span><span> </span><span style="color:#608b4e;">// in src/gdt.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::VirtAddr; </span><span style="color:#569cd6;">use </span><span>x86_64::structures::tss::TaskStateSegment; </span><span style="color:#569cd6;">use </span><span>lazy_static::lazy_static; </span><span> </span><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX</span><span>: </span><span style="color:#569cd6;">u16 </span><span>= </span><span style="color:#b5cea8;">0</span><span>; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">TSS</span><span>: TaskStateSegment = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> tss = TaskStateSegment::new(); </span><span> tss.interrupt_stack_table[</span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX </span><span style="color:#569cd6;">as usize</span><span>] = { </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">STACK_SIZE</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">4096 </span><span>* </span><span style="color:#b5cea8;">5</span><span>; </span><span> </span><span style="color:#569cd6;">static mut </span><span style="color:#b4cea8;">STACK</span><span>: [</span><span style="color:#569cd6;">u8</span><span>; </span><span style="color:#b4cea8;">STACK_SIZE</span><span>] = [</span><span style="color:#b5cea8;">0</span><span>; </span><span style="color:#b4cea8;">STACK_SIZE</span><span>]; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> stack_start = VirtAddr::from_ptr(</span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span style="color:#b4cea8;">STACK </span><span>}); </span><span> </span><span style="color:#569cd6;">let</span><span> stack_end = stack_start + </span><span style="color:#b4cea8;">STACK_SIZE</span><span>; </span><span> stack_end </span><span> }; </span><span> tss </span><span> }; </span><span>} </span></code></pre> <p>We use <code>lazy_static</code> because Rust’s const evaluator is not yet powerful enough to do this initialization at compile time. We define that the 0th IST entry is the double fault stack (any other IST index would work too). Then we write the top address of a double fault stack to the 0th entry. We write the top address because stacks on x86 grow downwards, i.e., from high addresses to low addresses.</p> <p>We haven’t implemented memory management yet, so we don’t have a proper way to allocate a new stack. Instead, we use a <code>static mut</code> array as stack storage for now. The <code>unsafe</code> is required because the compiler can’t guarantee race freedom when mutable statics are accessed. It is important that it is a <code>static mut</code> and not an immutable <code>static</code>, because otherwise the bootloader will map it to a read-only page. We will replace this with a proper stack allocation in a later post, then the <code>unsafe</code> will no longer be needed at this place.</p> <p>Note that this double fault stack has no guard page that protects against stack overflow. This means that we should not do anything stack-intensive in our double fault handler because a stack overflow might corrupt the memory below the stack.</p> <h4 id="loading-the-tss"><a class="zola-anchor" href="#loading-the-tss" aria-label="Anchor link for: loading-the-tss">🔗</a>Loading the TSS</h4> <p>Now that we’ve created a new TSS, we need a way to tell the CPU that it should use it. Unfortunately, this is a bit cumbersome since the TSS uses the segmentation system (for historical reasons). Instead of loading the table directly, we need to add a new segment descriptor to the <a href="https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/">Global Descriptor Table</a> (GDT). Then we can load our TSS by invoking the <a href="https://www.felixcloutier.com/x86/ltr"><code>ltr</code> instruction</a> with the respective GDT index. (This is the reason why we named our module <code>gdt</code>.)</p> <h3 id="the-global-descriptor-table"><a class="zola-anchor" href="#the-global-descriptor-table" aria-label="Anchor link for: the-global-descriptor-table">🔗</a>The Global Descriptor Table</h3> <p>The Global Descriptor Table (GDT) is a relic that was used for <a href="https://en.wikipedia.org/wiki/X86_memory_segmentation">memory segmentation</a> before paging became the de facto standard. However, it is still needed in 64-bit mode for various things, such as kernel/user mode configuration or TSS loading.</p> <p>The GDT is a structure that contains the <em>segments</em> of the program. It was used on older architectures to isolate programs from each other before paging became the standard. For more information about segmentation, check out the equally named chapter of the free <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/">“Three Easy Pieces” book</a>. While segmentation is no longer supported in 64-bit mode, the GDT still exists. It is mostly used for two things: Switching between kernel space and user space, and loading a TSS structure.</p> <h4 id="creating-a-gdt"><a class="zola-anchor" href="#creating-a-gdt" aria-label="Anchor link for: creating-a-gdt">🔗</a>Creating a GDT</h4> <p>Let’s create a static <code>GDT</code> that includes a segment for our <code>TSS</code> static:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/gdt.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">GDT</span><span>: GlobalDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> gdt = GlobalDescriptorTable::new(); </span><span> gdt.add_entry(Descriptor::kernel_code_segment()); </span><span> gdt.add_entry(Descriptor::tss_segment(</span><span style="color:#569cd6;">&amp;</span><span style="color:#b4cea8;">TSS</span><span>)); </span><span> gdt </span><span> }; </span><span>} </span></code></pre> <p>As before, we use <code>lazy_static</code> again. We create a new GDT with a code segment and a TSS segment.</p> <h4 id="loading-the-gdt"><a class="zola-anchor" href="#loading-the-gdt" aria-label="Anchor link for: loading-the-gdt">🔗</a>Loading the GDT</h4> <p>To load our GDT, we create a new <code>gdt::init</code> function that we call from our <code>init</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/gdt.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#b4cea8;">GDT</span><span>.load(); </span><span>} </span><span> </span><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> gdt::init(); </span><span> interrupts::init_idt(); </span><span>} </span></code></pre> <p>Now our GDT is loaded (since the <code>_start</code> function calls <code>init</code>), but we still see the boot loop on stack overflow.</p> <h3 id="the-final-steps"><a class="zola-anchor" href="#the-final-steps" aria-label="Anchor link for: the-final-steps">🔗</a>The Final Steps</h3> <p>The problem is that the GDT segments are not yet active because the segment and TSS registers still contain the values from the old GDT. We also need to modify the double fault IDT entry so that it uses the new stack.</p> <p>In summary, we need to do the following:</p> <ol> <li><strong>Reload code segment register</strong>: We changed our GDT, so we should reload <code>cs</code>, the code segment register. This is required since the old segment selector could now point to a different GDT descriptor (e.g., a TSS descriptor).</li> <li><strong>Load the TSS</strong>: We loaded a GDT that contains a TSS selector, but we still need to tell the CPU that it should use that TSS.</li> <li><strong>Update the IDT entry</strong>: As soon as our TSS is loaded, the CPU has access to a valid interrupt stack table (IST). Then we can tell the CPU that it should use our new double fault stack by modifying our double fault IDT entry.</li> </ol> <p>For the first two steps, we need access to the <code>code_selector</code> and <code>tss_selector</code> variables in our <code>gdt::init</code> function. We can achieve this by making them part of the static through a new <code>Selectors</code> struct:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/gdt.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::gdt::SegmentSelector; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">GDT</span><span>: (GlobalDescriptorTable, Selectors) = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> gdt = GlobalDescriptorTable::new(); </span><span> </span><span style="color:#569cd6;">let</span><span> code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); </span><span> </span><span style="color:#569cd6;">let</span><span> tss_selector = gdt.add_entry(Descriptor::tss_segment(</span><span style="color:#569cd6;">&amp;</span><span style="color:#b4cea8;">TSS</span><span>)); </span><span> (gdt, Selectors { code_selector, tss_selector }) </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">struct </span><span>Selectors { </span><span> code_selector: SegmentSelector, </span><span> tss_selector: SegmentSelector, </span><span>} </span></code></pre> <p>Now we can use the selectors to reload the <code>cs</code> register and load our <code>TSS</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/gdt.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::tables::load_tss; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::segmentation::{</span><span style="color:#b4cea8;">CS</span><span>, Segment}; </span><span> </span><span> </span><span style="color:#b4cea8;">GDT</span><span>.</span><span style="color:#b5cea8;">0.</span><span>load(); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">CS</span><span>::set_reg(</span><span style="color:#b4cea8;">GDT</span><span>.</span><span style="color:#b5cea8;">1.</span><span>code_selector); </span><span> load_tss(</span><span style="color:#b4cea8;">GDT</span><span>.</span><span style="color:#b5cea8;">1.</span><span>tss_selector); </span><span> } </span><span>} </span></code></pre> <p>We reload the code segment register using <a href="https://docs.rs/x86_64/0.14.5/x86_64/instructions/segmentation/struct.CS.html#method.set_reg"><code>CS::set_reg</code></a> and load the TSS using <a href="https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html"><code>load_tss</code></a>. The functions are marked as <code>unsafe</code>, so we need an <code>unsafe</code> block to invoke them. The reason is that it might be possible to break memory safety by loading invalid selectors.</p> <p>Now that we have loaded a valid TSS and interrupt stack table, we can set the stack index for our double fault handler in the IDT:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use crate</span><span>::gdt; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> idt.double_fault.set_handler_fn(double_fault_handler) </span><span> .set_stack_index(gdt::</span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX</span><span>); </span><span style="color:#608b4e;">// new </span><span> } </span><span> </span><span> idt </span><span> }; </span><span>} </span></code></pre> <p>The <code>set_stack_index</code> method is unsafe because the caller must ensure that the used index is valid and not already used for another exception.</p> <p>That’s it! Now the CPU should switch to the double fault stack whenever a double fault occurs. Thus, we are able to catch <em>all</em> double faults, including kernel stack overflows:</p> <p><img src="https://os.phil-opp.com/double-fault-exceptions/qemu-double-fault-on-stack-overflow.png" alt="QEMU printing EXCEPTION: DOUBLE FAULT and a dump of the exception stack frame" /></p> <p>From now on, we should never see a triple fault again! To ensure that we don’t accidentally break the above, we should add a test for this.</p> <h2 id="a-stack-overflow-test"><a class="zola-anchor" href="#a-stack-overflow-test" aria-label="Anchor link for: a-stack-overflow-test">🔗</a>A Stack Overflow Test</h2> <p>To test our new <code>gdt</code> module and ensure that the double fault handler is correctly called on a stack overflow, we can add an integration test. The idea is to provoke a double fault in the test function and verify that the double fault handler is called.</p> <p>Let’s start with a minimal skeleton:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/stack_overflow.rs </span><span> </span><span>#![no_std] </span><span>#![no_main] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> unimplemented!(); </span><span>} </span><span> </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> blog_os::test_panic_handler(info) </span><span>} </span></code></pre> <p>Like our <code>panic_handler</code> test, the test will run <a href="https://os.phil-opp.com/testing/#no-harness-tests">without a test harness</a>. The reason is that we can’t continue execution after a double fault, so more than one test doesn’t make sense. To disable the test harness for the test, we add the following to our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[[</span><span style="color:#808080;">test</span><span>]] </span><span style="color:#569cd6;">name </span><span>= </span><span style="color:#d69d85;">&quot;stack_overflow&quot; </span><span style="color:#569cd6;">harness </span><span>= </span><span style="color:#569cd6;">false </span></code></pre> <p>Now <code>cargo test --test stack_overflow</code> should compile successfully. The test fails, of course, since the <code>unimplemented</code> macro panics.</p> <h3 id="implementing-start"><a class="zola-anchor" href="#implementing-start" aria-label="Anchor link for: implementing-start">🔗</a>Implementing <code>_start</code></h3> <p>The implementation of the <code>_start</code> function looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/stack_overflow.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::serial_print; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_print!(</span><span style="color:#d69d85;">&quot;stack_overflow::stack_overflow...</span><span style="color:#e3bbab;">\t</span><span style="color:#d69d85;">&quot;</span><span>); </span><span> </span><span> blog_os::gdt::init(); </span><span> init_test_idt(); </span><span> </span><span> </span><span style="color:#608b4e;">// trigger a stack overflow </span><span> stack_overflow(); </span><span> </span><span> panic!(</span><span style="color:#d69d85;">&quot;Execution continued after stack overflow&quot;</span><span>); </span><span>} </span><span> </span><span>#[allow(unconditional_recursion)] </span><span style="color:#569cd6;">fn </span><span>stack_overflow() { </span><span> stack_overflow(); </span><span style="color:#608b4e;">// for each recursion, the return address is pushed </span><span> volatile::Volatile::new(</span><span style="color:#b5cea8;">0</span><span>).read(); </span><span style="color:#608b4e;">// prevent tail recursion optimizations </span><span>} </span></code></pre> <p>We call our <code>gdt::init</code> function to initialize a new GDT. Instead of calling our <code>interrupts::init_idt</code> function, we call an <code>init_test_idt</code> function that will be explained in a moment. The reason is that we want to register a custom double fault handler that does an <code>exit_qemu(QemuExitCode::Success)</code> instead of panicking.</p> <p>The <code>stack_overflow</code> function is almost identical to the function in our <code>main.rs</code>. The only difference is that at the end of the function, we perform an additional <a href="https://en.wikipedia.org/wiki/Volatile_(computer_programming)">volatile</a> read using the <a href="https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html"><code>Volatile</code></a> type to prevent a compiler optimization called <a href="https://en.wikipedia.org/wiki/Tail_call"><em>tail call elimination</em></a>. Among other things, this optimization allows the compiler to transform a function whose last statement is a recursive function call into a normal loop. Thus, no additional stack frame is created for the function call, so the stack usage remains constant.</p> <p>In our case, however, we want the stack overflow to happen, so we add a dummy volatile read statement at the end of the function, which the compiler is not allowed to remove. Thus, the function is no longer <em>tail recursive</em>, and the transformation into a loop is prevented. We also add the <code>allow(unconditional_recursion)</code> attribute to silence the compiler warning that the function recurses endlessly.</p> <h3 id="the-test-idt"><a class="zola-anchor" href="#the-test-idt" aria-label="Anchor link for: the-test-idt">🔗</a>The Test IDT</h3> <p>As noted above, the test needs its own IDT with a custom double fault handler. The implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/stack_overflow.rs </span><span> </span><span style="color:#569cd6;">use </span><span>lazy_static::lazy_static; </span><span style="color:#569cd6;">use </span><span>x86_64::structures::idt::InterruptDescriptorTable; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">TEST_IDT</span><span>: InterruptDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> idt.double_fault </span><span> .set_handler_fn(test_double_fault_handler) </span><span> .set_stack_index(blog_os::gdt::</span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX</span><span>); </span><span> } </span><span> </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_test_idt() { </span><span> </span><span style="color:#b4cea8;">TEST_IDT</span><span>.load(); </span><span>} </span></code></pre> <p>The implementation is very similar to our normal IDT in <code>interrupts.rs</code>. Like in the normal IDT, we set a stack index in the IST for the double fault handler in order to switch to a separate stack. The <code>init_test_idt</code> function loads the IDT on the CPU through the <code>load</code> method.</p> <h3 id="the-double-fault-handler"><a class="zola-anchor" href="#the-double-fault-handler" aria-label="Anchor link for: the-double-fault-handler">🔗</a>The Double Fault Handler</h3> <p>The only missing piece is our double fault handler. It looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in tests/stack_overflow.rs </span><span> </span><span style="color:#569cd6;">use </span><span>blog_os::{exit_qemu, QemuExitCode, serial_println}; </span><span style="color:#569cd6;">use </span><span>x86_64::structures::idt::InterruptStackFrame; </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>test_double_fault_handler( </span><span> _stack_frame: InterruptStackFrame, </span><span> _error_code: </span><span style="color:#569cd6;">u64</span><span>, </span><span>) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;[ok]&quot;</span><span>); </span><span> exit_qemu(QemuExitCode::Success); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When the double fault handler is called, we exit QEMU with a success exit code, which marks the test as passed. Since integration tests are completely separate executables, we need to set the <code>#![feature(abi_x86_interrupt)]</code> attribute again at the top of our test file.</p> <p>Now we can run our test through <code>cargo test --test stack_overflow</code> (or <code>cargo test</code> to run all tests). As expected, we see the <code>stack_overflow... [ok]</code> output in the console. Try to comment out the <code>set_stack_index</code> line; it should cause the test to fail.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>In this post, we learned what a double fault is and under which conditions it occurs. We added a basic double fault handler that prints an error message and added an integration test for it.</p> <p>We also enabled the hardware-supported stack switching on double fault exceptions so that it also works on stack overflow. While implementing it, we learned about the task state segment (TSS), the contained interrupt stack table (IST), and the global descriptor table (GDT), which was used for segmentation on older architectures.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>The next post explains how to handle interrupts from external devices such as timers, keyboards, or network controllers. These hardware interrupts are very similar to exceptions, e.g., they are also dispatched through the IDT. However, unlike exceptions, they don’t arise directly on the CPU. Instead, an <em>interrupt controller</em> aggregates these interrupts and forwards them to the CPU depending on their priority. In the next post, we will explore the <a href="https://en.wikipedia.org/wiki/Intel_8259">Intel 8259</a> (“PIC”) interrupt controller and learn how to implement keyboard support.</p> CPU Exceptions Sun, 17 Jun 2018 00:00:00 +0000 https://os.phil-opp.com/cpu-exceptions/ https://os.phil-opp.com/cpu-exceptions/ <p>CPU exceptions occur in various erroneous situations, for example, when accessing an invalid memory address or when dividing by zero. To react to them, we have to set up an <em>interrupt descriptor table</em> that provides handler functions. At the end of this post, our kernel will be able to catch <a href="https://wiki.osdev.org/Exceptions#Breakpoint">breakpoint exceptions</a> and resume normal execution afterward.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/cpu-exceptions/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-05"><code>post-05</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="overview"><a class="zola-anchor" href="#overview" aria-label="Anchor link for: overview">🔗</a>Overview</h2> <p>An exception signals that something is wrong with the current instruction. For example, the CPU issues an exception if the current instruction tries to divide by 0. When an exception occurs, the CPU interrupts its current work and immediately calls a specific exception handler function, depending on the exception type.</p> <p>On x86, there are about 20 different CPU exception types. The most important are:</p> <ul> <li><strong>Page Fault</strong>: A page fault occurs on illegal memory accesses. For example, if the current instruction tries to read from an unmapped page or tries to write to a read-only page.</li> <li><strong>Invalid Opcode</strong>: This exception occurs when the current instruction is invalid, for example, when we try to use new <a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE instructions</a> on an old CPU that does not support them.</li> <li><strong>General Protection Fault</strong>: This is the exception with the broadest range of causes. It occurs on various kinds of access violations, such as trying to execute a privileged instruction in user-level code or writing reserved fields in configuration registers.</li> <li><strong>Double Fault</strong>: When an exception occurs, the CPU tries to call the corresponding handler function. If another exception occurs <em>while calling the exception handler</em>, the CPU raises a double fault exception. This exception also occurs when there is no handler function registered for an exception.</li> <li><strong>Triple Fault</strong>: If an exception occurs while the CPU tries to call the double fault handler function, it issues a fatal <em>triple fault</em>. We can’t catch or handle a triple fault. Most processors react by resetting themselves and rebooting the operating system.</li> </ul> <p>For the full list of exceptions, check out the <a href="https://wiki.osdev.org/Exceptions">OSDev wiki</a>.</p> <h3 id="the-interrupt-descriptor-table"><a class="zola-anchor" href="#the-interrupt-descriptor-table" aria-label="Anchor link for: the-interrupt-descriptor-table">🔗</a>The Interrupt Descriptor Table</h3> <p>In order to catch and handle exceptions, we have to set up a so-called <em>Interrupt Descriptor Table</em> (IDT). In this table, we can specify a handler function for each CPU exception. The hardware uses this table directly, so we need to follow a predefined format. Each entry must have the following 16-byte structure:</p> <table><thead><tr><th>Type</th><th>Name</th><th>Description</th></tr></thead><tbody> <tr><td>u16</td><td>Function Pointer [0:15]</td><td>The lower bits of the pointer to the handler function.</td></tr> <tr><td>u16</td><td>GDT selector</td><td>Selector of a code segment in the <a href="https://en.wikipedia.org/wiki/Global_Descriptor_Table">global descriptor table</a>.</td></tr> <tr><td>u16</td><td>Options</td><td>(see below)</td></tr> <tr><td>u16</td><td>Function Pointer [16:31]</td><td>The middle bits of the pointer to the handler function.</td></tr> <tr><td>u32</td><td>Function Pointer [32:63]</td><td>The remaining bits of the pointer to the handler function.</td></tr> <tr><td>u32</td><td>Reserved</td><td></td></tr> </tbody></table> <p>The options field has the following format:</p> <table><thead><tr><th>Bits</th><th>Name</th><th>Description</th></tr></thead><tbody> <tr><td>0-2</td><td>Interrupt Stack Table Index</td><td>0: Don’t switch stacks, 1-7: Switch to the n-th stack in the Interrupt Stack Table when this handler is called.</td></tr> <tr><td>3-7</td><td>Reserved</td><td></td></tr> <tr><td>8</td><td>0: Interrupt Gate, 1: Trap Gate</td><td>If this bit is 0, interrupts are disabled when this handler is called.</td></tr> <tr><td>9-11</td><td>must be one</td><td></td></tr> <tr><td>12</td><td>must be zero</td><td></td></tr> <tr><td>13‑14</td><td>Descriptor Privilege Level (DPL)</td><td>The minimal privilege level required for calling this handler.</td></tr> <tr><td>15</td><td>Present</td><td></td></tr> </tbody></table> <p>Each exception has a predefined IDT index. For example, the invalid opcode exception has table index 6 and the page fault exception has table index 14. Thus, the hardware can automatically load the corresponding IDT entry for each exception. The <a href="https://wiki.osdev.org/Exceptions">Exception Table</a> in the OSDev wiki shows the IDT indexes of all exceptions in the “Vector nr.” column.</p> <p>When an exception occurs, the CPU roughly does the following:</p> <ol> <li>Push some registers on the stack, including the instruction pointer and the <a href="https://en.wikipedia.org/wiki/FLAGS_register">RFLAGS</a> register. (We will use these values later in this post.)</li> <li>Read the corresponding entry from the Interrupt Descriptor Table (IDT). For example, the CPU reads the 14th entry when a page fault occurs.</li> <li>Check if the entry is present and, if not, raise a double fault.</li> <li>Disable hardware interrupts if the entry is an interrupt gate (bit 40 not set).</li> <li>Load the specified <a href="https://en.wikipedia.org/wiki/Global_Descriptor_Table">GDT</a> selector into the CS (code segment).</li> <li>Jump to the specified handler function.</li> </ol> <p>Don’t worry about steps 4 and 5 for now; we will learn about the global descriptor table and hardware interrupts in future posts.</p> <h2 id="an-idt-type"><a class="zola-anchor" href="#an-idt-type" aria-label="Anchor link for: an-idt-type">🔗</a>An IDT Type</h2> <p>Instead of creating our own IDT type, we will use the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html"><code>InterruptDescriptorTable</code> struct</a> of the <code>x86_64</code> crate, which looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[repr(C)] </span><span style="color:#569cd6;">pub struct </span><span>InterruptDescriptorTable { </span><span> </span><span style="color:#569cd6;">pub </span><span>divide_by_zero: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>debug: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>non_maskable_interrupt: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>breakpoint: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>overflow: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>bound_range_exceeded: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>invalid_opcode: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>device_not_available: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>double_fault: Entry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>invalid_tss: Entry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>segment_not_present: Entry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>stack_segment_fault: Entry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>general_protection_fault: Entry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>page_fault: Entry&lt;PageFaultHandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>x87_floating_point: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>alignment_check: Entry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>machine_check: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>simd_floating_point: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>virtualization: Entry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>security_exception: Entry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#608b4e;">// some fields omitted </span><span>} </span></code></pre> <p>The fields have the type <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.Entry.html"><code>idt::Entry&lt;F&gt;</code></a>, which is a struct that represents the fields of an IDT entry (see the table above). The type parameter <code>F</code> defines the expected handler function type. We see that some entries require a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFunc.html"><code>HandlerFunc</code></a> and some entries require a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html"><code>HandlerFuncWithErrCode</code></a>. The page fault even has its own special type: <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.PageFaultHandlerFunc.html"><code>PageFaultHandlerFunc</code></a>.</p> <p>Let’s look at the <code>HandlerFunc</code> type first:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">HandlerFunc </span><span>= </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn</span><span>(</span><span style="color:#569cd6;">_</span><span>: InterruptStackFrame); </span></code></pre> <p>It’s a <a href="https://doc.rust-lang.org/book/ch19-04-advanced-types.html#creating-type-synonyms-with-type-aliases">type alias</a> for an <code>extern "x86-interrupt" fn</code> type. The <code>extern</code> keyword defines a function with a <a href="https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions">foreign calling convention</a> and is often used to communicate with C code (<code>extern "C" fn</code>). But what is the <code>x86-interrupt</code> calling convention?</p> <h2 id="the-interrupt-calling-convention"><a class="zola-anchor" href="#the-interrupt-calling-convention" aria-label="Anchor link for: the-interrupt-calling-convention">🔗</a>The Interrupt Calling Convention</h2> <p>Exceptions are quite similar to function calls: The CPU jumps to the first instruction of the called function and executes it. Afterwards, the CPU jumps to the return address and continues the execution of the parent function.</p> <p>However, there is a major difference between exceptions and function calls: A function call is invoked voluntarily by a compiler-inserted <code>call</code> instruction, while an exception might occur at <em>any</em> instruction. In order to understand the consequences of this difference, we need to examine function calls in more detail.</p> <p><a href="https://en.wikipedia.org/wiki/Calling_convention">Calling conventions</a> specify the details of a function call. For example, they specify where function parameters are placed (e.g. in registers or on the stack) and how results are returned. On x86_64 Linux, the following rules apply for C functions (specified in the <a href="https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf">System V ABI</a>):</p> <ul> <li>the first six integer arguments are passed in registers <code>rdi</code>, <code>rsi</code>, <code>rdx</code>, <code>rcx</code>, <code>r8</code>, <code>r9</code></li> <li>additional arguments are passed on the stack</li> <li>results are returned in <code>rax</code> and <code>rdx</code></li> </ul> <p>Note that Rust does not follow the C ABI (in fact, <a href="https://github.com/rust-lang/rfcs/issues/600">there isn’t even a Rust ABI yet</a>), so these rules apply only to functions declared as <code>extern "C" fn</code>.</p> <h3 id="preserved-and-scratch-registers"><a class="zola-anchor" href="#preserved-and-scratch-registers" aria-label="Anchor link for: preserved-and-scratch-registers">🔗</a>Preserved and Scratch Registers</h3> <p>The calling convention divides the registers into two parts: <em>preserved</em> and <em>scratch</em> registers.</p> <p>The values of <em>preserved</em> registers must remain unchanged across function calls. So a called function (the <em>“callee”</em>) is only allowed to overwrite these registers if it restores their original values before returning. Therefore, these registers are called <em>“callee-saved”</em>. A common pattern is to save these registers to the stack at the function’s beginning and restore them just before returning.</p> <p>In contrast, a called function is allowed to overwrite <em>scratch</em> registers without restrictions. If the caller wants to preserve the value of a scratch register across a function call, it needs to backup and restore it before the function call (e.g., by pushing it to the stack). So the scratch registers are <em>caller-saved</em>.</p> <p>On x86_64, the C calling convention specifies the following preserved and scratch registers:</p> <table><thead><tr><th>preserved registers</th><th>scratch registers</th></tr></thead><tbody> <tr><td><code>rbp</code>, <code>rbx</code>, <code>rsp</code>, <code>r12</code>, <code>r13</code>, <code>r14</code>, <code>r15</code></td><td><code>rax</code>, <code>rcx</code>, <code>rdx</code>, <code>rsi</code>, <code>rdi</code>, <code>r8</code>, <code>r9</code>, <code>r10</code>, <code>r11</code></td></tr> <tr><td><em>callee-saved</em></td><td><em>caller-saved</em></td></tr> </tbody></table> <p>The compiler knows these rules, so it generates the code accordingly. For example, most functions begin with a <code>push rbp</code>, which backups <code>rbp</code> on the stack (because it’s a callee-saved register).</p> <h3 id="preserving-all-registers"><a class="zola-anchor" href="#preserving-all-registers" aria-label="Anchor link for: preserving-all-registers">🔗</a>Preserving all Registers</h3> <p>In contrast to function calls, exceptions can occur on <em>any</em> instruction. In most cases, we don’t even know at compile time if the generated code will cause an exception. For example, the compiler can’t know if an instruction causes a stack overflow or a page fault.</p> <p>Since we don’t know when an exception occurs, we can’t backup any registers before. This means we can’t use a calling convention that relies on caller-saved registers for exception handlers. Instead, we need a calling convention that preserves <em>all registers</em>. The <code>x86-interrupt</code> calling convention is such a calling convention, so it guarantees that all register values are restored to their original values on function return.</p> <p>Note that this does not mean all registers are saved to the stack at function entry. Instead, the compiler only backs up the registers that are overwritten by the function. This way, very efficient code can be generated for short functions that only use a few registers.</p> <h3 id="the-interrupt-stack-frame"><a class="zola-anchor" href="#the-interrupt-stack-frame" aria-label="Anchor link for: the-interrupt-stack-frame">🔗</a>The Interrupt Stack Frame</h3> <p>On a normal function call (using the <code>call</code> instruction), the CPU pushes the return address before jumping to the target function. On function return (using the <code>ret</code> instruction), the CPU pops this return address and jumps to it. So the stack frame of a normal function call looks like this:</p> <p><img src="https://os.phil-opp.com/cpu-exceptions/function-stack-frame.svg" alt="function stack frame" /></p> <p>For exception and interrupt handlers, however, pushing a return address would not suffice, since interrupt handlers often run in a different context (stack pointer, CPU flags, etc.). Instead, the CPU performs the following steps when an interrupt occurs:</p> <ol start="0"> <li><strong>Saving the old stack pointer</strong>: The CPU reads the stack pointer (<code>rsp</code>) and stack segment (<code>ss</code>) register values and remembers them in an internal buffer.</li> <li><strong>Aligning the stack pointer</strong>: An interrupt can occur at any instruction, so the stack pointer can have any value, too. However, some CPU instructions (e.g., some SSE instructions) require that the stack pointer be aligned on a 16-byte boundary, so the CPU performs such an alignment right after the interrupt.</li> <li><strong>Switching stacks</strong> (in some cases): A stack switch occurs when the CPU privilege level changes, for example, when a CPU exception occurs in a user-mode program. It is also possible to configure stack switches for specific interrupts using the so-called <em>Interrupt Stack Table</em> (described in the next post).</li> <li><strong>Pushing the old stack pointer</strong>: The CPU pushes the <code>rsp</code> and <code>ss</code> values from step 0 to the stack. This makes it possible to restore the original stack pointer when returning from an interrupt handler.</li> <li><strong>Pushing and updating the <code>RFLAGS</code> register</strong>: The <a href="https://en.wikipedia.org/wiki/FLAGS_register"><code>RFLAGS</code></a> register contains various control and status bits. On interrupt entry, the CPU changes some bits and pushes the old value.</li> <li><strong>Pushing the instruction pointer</strong>: Before jumping to the interrupt handler function, the CPU pushes the instruction pointer (<code>rip</code>) and the code segment (<code>cs</code>). This is comparable to the return address push of a normal function call.</li> <li><strong>Pushing an error code</strong> (for some exceptions): For some specific exceptions, such as page faults, the CPU pushes an error code, which describes the cause of the exception.</li> <li><strong>Invoking the interrupt handler</strong>: The CPU reads the address and the segment descriptor of the interrupt handler function from the corresponding field in the IDT. It then invokes this handler by loading the values into the <code>rip</code> and <code>cs</code> registers.</li> </ol> <p>So the <em>interrupt stack frame</em> looks like this:</p> <p><img src="https://os.phil-opp.com/cpu-exceptions/exception-stack-frame.svg" alt="interrupt stack frame" /></p> <p>In the <code>x86_64</code> crate, the interrupt stack frame is represented by the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptStackFrame.html"><code>InterruptStackFrame</code></a> struct. It is passed to interrupt handlers as <code>&amp;mut</code> and can be used to retrieve additional information about the exception’s cause. The struct contains no error code field, since only a few exceptions push an error code. These exceptions use the separate <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html"><code>HandlerFuncWithErrCode</code></a> function type, which has an additional <code>error_code</code> argument.</p> <h3 id="behind-the-scenes"><a class="zola-anchor" href="#behind-the-scenes" aria-label="Anchor link for: behind-the-scenes">🔗</a>Behind the Scenes</h3> <p>The <code>x86-interrupt</code> calling convention is a powerful abstraction that hides almost all of the messy details of the exception handling process. However, sometimes it’s useful to know what’s happening behind the curtain. Here is a short overview of the things that the <code>x86-interrupt</code> calling convention takes care of:</p> <ul> <li><strong>Retrieving the arguments</strong>: Most calling conventions expect that the arguments are passed in registers. This is not possible for exception handlers since we must not overwrite any register values before backing them up on the stack. Instead, the <code>x86-interrupt</code> calling convention is aware that the arguments already lie on the stack at a specific offset.</li> <li><strong>Returning using <code>iretq</code></strong>: Since the interrupt stack frame completely differs from stack frames of normal function calls, we can’t return from handler functions through the normal <code>ret</code> instruction. So instead, the <code>iretq</code> instruction must be used.</li> <li><strong>Handling the error code</strong>: The error code, which is pushed for some exceptions, makes things much more complex. It changes the stack alignment (see the next point) and needs to be popped off the stack before returning. The <code>x86-interrupt</code> calling convention handles all that complexity. However, it doesn’t know which handler function is used for which exception, so it needs to deduce that information from the number of function arguments. That means the programmer is still responsible for using the correct function type for each exception. Luckily, the <code>InterruptDescriptorTable</code> type defined by the <code>x86_64</code> crate ensures that the correct function types are used.</li> <li><strong>Aligning the stack</strong>: Some instructions (especially SSE instructions) require a 16-byte stack alignment. The CPU ensures this alignment whenever an exception occurs, but for some exceptions it destroys it again later when it pushes an error code. The <code>x86-interrupt</code> calling convention takes care of this by realigning the stack in this case.</li> </ul> <p>If you are interested in more details, we also have a series of posts that explain exception handling using <a href="https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md">naked functions</a> linked <a href="https://os.phil-opp.com/cpu-exceptions/#too-much-magic">at the end of this post</a>.</p> <h2 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h2> <p>Now that we’ve understood the theory, it’s time to handle CPU exceptions in our kernel. We’ll start by creating a new interrupts module in <code>src/interrupts.rs</code>, that first creates an <code>init_idt</code> function that creates a new <code>InterruptDescriptorTable</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub mod </span><span>interrupts; </span><span> </span><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::idt::InterruptDescriptorTable; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_idt() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span>} </span></code></pre> <p>Now we can add handler functions. We start by adding a handler for the <a href="https://wiki.osdev.org/Exceptions#Breakpoint">breakpoint exception</a>. The breakpoint exception is the perfect exception to test exception handling. Its only purpose is to temporarily pause a program when the breakpoint instruction <code>int3</code> is executed.</p> <p>The breakpoint exception is commonly used in debuggers: When the user sets a breakpoint, the debugger overwrites the corresponding instruction with the <code>int3</code> instruction so that the CPU throws the breakpoint exception when it reaches that line. When the user wants to continue the program, the debugger replaces the <code>int3</code> instruction with the original instruction again and continues the program. For more details, see the <a href="https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints">“<em>How debuggers work</em>”</a> series.</p> <p>For our use case, we don’t need to overwrite any instructions. Instead, we just want to print a message when the breakpoint instruction is executed and then continue the program. So let’s create a simple <code>breakpoint_handler</code> function and add it to our IDT:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::idt::{InterruptDescriptorTable, InterruptStackFrame}; </span><span style="color:#569cd6;">use crate</span><span>::println; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_idt() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span>} </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>breakpoint_handler( </span><span> stack_frame: InterruptStackFrame) </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;EXCEPTION: BREAKPOINT</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, stack_frame); </span><span>} </span></code></pre> <p>Our handler just outputs a message and pretty-prints the interrupt stack frame.</p> <p>When we try to compile it, the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0658]: x86-interrupt ABI is experimental and subject to change (see issue #40180) </span><span> --&gt; src/main.rs:53:1 </span><span> | </span><span>53 | / extern &quot;x86-interrupt&quot; fn breakpoint_handler(stack_frame: InterruptStackFrame) { </span><span>54 | | println!(&quot;EXCEPTION: BREAKPOINT\n{:#?}&quot;, stack_frame); </span><span>55 | | } </span><span> | |_^ </span><span> | </span><span> = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable </span></code></pre> <p>This error occurs because the <code>x86-interrupt</code> calling convention is still unstable. To use it anyway, we have to explicitly enable it by adding <code>#![feature(abi_x86_interrupt)]</code> at the top of our <code>lib.rs</code>.</p> <h3 id="loading-the-idt"><a class="zola-anchor" href="#loading-the-idt" aria-label="Anchor link for: loading-the-idt">🔗</a>Loading the IDT</h3> <p>In order for the CPU to use our new interrupt descriptor table, we need to load it using the <a href="https://www.felixcloutier.com/x86/lgdt:lidt"><code>lidt</code></a> instruction. The <code>InterruptDescriptorTable</code> struct of the <code>x86_64</code> crate provides a <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load"><code>load</code></a> method for that. Let’s try to use it:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_idt() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> idt.load(); </span><span>} </span></code></pre> <p>When we try to compile it now, the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: `idt` does not live long enough </span><span> --&gt; src/interrupts/mod.rs:43:5 </span><span> | </span><span>43 | idt.load(); </span><span> | ^^^ does not live long enough </span><span>44 | } </span><span> | - borrowed value only lives until here </span><span> | </span><span> = note: borrowed value must be valid for the static lifetime... </span></code></pre> <p>So the <code>load</code> method expects a <code>&amp;'static self</code>, that is, a reference valid for the complete runtime of the program. The reason is that the CPU will access this table on every interrupt until we load a different IDT. So using a shorter lifetime than <code>'static</code> could lead to use-after-free bugs.</p> <p>In fact, this is exactly what happens here. Our <code>idt</code> is created on the stack, so it is only valid inside the <code>init</code> function. Afterwards, the stack memory is reused for other functions, so the CPU would interpret random stack memory as IDT. Luckily, the <code>InterruptDescriptorTable::load</code> method encodes this lifetime requirement in its function definition, so that the Rust compiler is able to prevent this possible bug at compile time.</p> <p>In order to fix this problem, we need to store our <code>idt</code> at a place where it has a <code>'static</code> lifetime. To achieve this, we could allocate our IDT on the heap using <a href="https://doc.rust-lang.org/std/boxed/struct.Box.html"><code>Box</code></a> and then convert it to a <code>'static</code> reference, but we are writing an OS kernel and thus don’t have a heap (yet).</p> <p>As an alternative, we could try to store the IDT as a <code>static</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = InterruptDescriptorTable::new(); </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_idt() { </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.breakpoint.set_handler_fn(breakpoint_handler); </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>However, there is a problem: Statics are immutable, so we can’t modify the breakpoint entry from our <code>init</code> function. We could solve this problem by using a <a href="https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable"><code>static mut</code></a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">static mut </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = InterruptDescriptorTable::new(); </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_idt() { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.breakpoint.set_handler_fn(breakpoint_handler); </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span> } </span><span>} </span></code></pre> <p>This variant compiles without errors but it’s far from idiomatic. <code>static mut</code>s are very prone to data races, so we need an <a href="https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers"><code>unsafe</code> block</a> on each access.</p> <h4 id="lazy-statics-to-the-rescue"><a class="zola-anchor" href="#lazy-statics-to-the-rescue" aria-label="Anchor link for: lazy-statics-to-the-rescue">🔗</a>Lazy Statics to the Rescue</h4> <p>Fortunately, the <code>lazy_static</code> macro exists. Instead of evaluating a <code>static</code> at compile time, the macro performs the initialization when the <code>static</code> is referenced the first time. Thus, we can do almost everything in the initialization block and are even able to read runtime values.</p> <p>We already imported the <code>lazy_static</code> crate when we <a href="https://os.phil-opp.com/vga-text-mode/#lazy-statics">created an abstraction for the VGA text buffer</a>. So we can directly use the <code>lazy_static!</code> macro to create our static IDT:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>lazy_static::lazy_static; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: InterruptDescriptorTable = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = InterruptDescriptorTable::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init_idt() { </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>Note how this solution requires no <code>unsafe</code> blocks. The <code>lazy_static!</code> macro does use <code>unsafe</code> behind the scenes, but it is abstracted away in a safe interface.</p> <h3 id="running-it"><a class="zola-anchor" href="#running-it" aria-label="Anchor link for: running-it">🔗</a>Running it</h3> <p>The last step for making exceptions work in our kernel is to call the <code>init_idt</code> function from our <code>main.rs</code>. Instead of calling it directly, we introduce a general <code>init</code> function in our <code>lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> interrupts::init_idt(); </span><span>} </span></code></pre> <p>With this function, we now have a central place for initialization routines that can be shared between the different <code>_start</code> functions in our <code>main.rs</code>, <code>lib.rs</code>, and integration tests.</p> <p>Now we can update the <code>_start</code> function of our <code>main.rs</code> to call <code>init</code> and then trigger a breakpoint exception:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> blog_os::init(); </span><span style="color:#608b4e;">// new </span><span> </span><span> </span><span style="color:#608b4e;">// invoke a breakpoint exception </span><span> x86_64::instructions::interrupts::int3(); </span><span style="color:#608b4e;">// new </span><span> </span><span> </span><span style="color:#608b4e;">// as before </span><span> #[cfg(test)] </span><span> test_main(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When we run it in QEMU now (using <code>cargo run</code>), we see the following:</p> <p><img src="https://os.phil-opp.com/cpu-exceptions/qemu-breakpoint-exception.png" alt="QEMU printing EXCEPTION: BREAKPOINT and the interrupt stack frame" /></p> <p>It works! The CPU successfully invokes our breakpoint handler, which prints the message, and then returns back to the <code>_start</code> function, where the <code>It did not crash!</code> message is printed.</p> <p>We see that the interrupt stack frame tells us the instruction and stack pointers at the time when the exception occurred. This information is very useful when debugging unexpected exceptions.</p> <h3 id="adding-a-test"><a class="zola-anchor" href="#adding-a-test" aria-label="Anchor link for: adding-a-test">🔗</a>Adding a Test</h3> <p>Let’s create a test that ensures that the above continues to work. First, we update the <code>_start</code> function to also call <code>init</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#608b4e;">/// Entry point for `cargo test` </span><span>#[cfg(test)] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> init(); </span><span style="color:#608b4e;">// new </span><span> test_main(); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>Remember, this <code>_start</code> function is used when running <code>cargo test --lib</code>, since Rust tests the <code>lib.rs</code> completely independently of the <code>main.rs</code>. We need to call <code>init</code> here to set up an IDT before running the tests.</p> <p>Now we can create a <code>test_breakpoint_exception</code> test:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span>#[test_case] </span><span style="color:#569cd6;">fn </span><span>test_breakpoint_exception() { </span><span> </span><span style="color:#608b4e;">// invoke a breakpoint exception </span><span> x86_64::instructions::interrupts::int3(); </span><span>} </span></code></pre> <p>The test invokes the <code>int3</code> function to trigger a breakpoint exception. By checking that the execution continues afterward, we verify that our breakpoint handler is working correctly.</p> <p>You can try this new test by running <code>cargo test</code> (all tests) or <code>cargo test --lib</code> (only tests of <code>lib.rs</code> and its modules). You should see the following in the output:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>blog_os::interrupts::test_breakpoint_exception... [ok] </span></code></pre> <h2 id="too-much-magic"><a class="zola-anchor" href="#too-much-magic" aria-label="Anchor link for: too-much-magic">🔗</a>Too much Magic?</h2> <p>The <code>x86-interrupt</code> calling convention and the <a href="https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html"><code>InterruptDescriptorTable</code></a> type made the exception handling process relatively straightforward and painless. If this was too much magic for you and you like to learn all the gory details of exception handling, we’ve got you covered: Our <a href="https://os.phil-opp.com/edition-1/extra/naked-exceptions/">“Handling Exceptions with Naked Functions”</a> series shows how to handle exceptions without the <code>x86-interrupt</code> calling convention and also creates its own IDT type. Historically, these posts were the main exception handling posts before the <code>x86-interrupt</code> calling convention and the <code>x86_64</code> crate existed. Note that these posts are based on the <a href="https://os.phil-opp.com/edition-1/">first edition</a> of this blog and might be out of date.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>We’ve successfully caught our first exception and returned from it! The next step is to ensure that we catch all exceptions because an uncaught exception causes a fatal <a href="https://wiki.osdev.org/Triple_Fault">triple fault</a>, which leads to a system reset. The next post explains how we can avoid this by correctly catching <a href="https://wiki.osdev.org/Double_Fault#Double_Fault">double faults</a>.</p> Integration Tests Fri, 15 Jun 2018 00:00:00 +0000 https://os.phil-opp.com/integration-tests/ https://os.phil-opp.com/integration-tests/ <p>To complete the testing picture we implement a basic integration test framework, which allows us to run tests on the target system. The idea is to run tests inside QEMU and report the results back to the host through the serial port.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/integration-tests/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-05"><code>post-05</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="requirements"><a class="zola-anchor" href="#requirements" aria-label="Anchor link for: requirements">🔗</a>Requirements</h2> <p>This post builds upon the <a href="https://os.phil-opp.com/unit-testing/"><em>Unit Testing</em></a> post, so you need to follow it first. Alternatively, consider reading the new <a href="https://os.phil-opp.com/testing/"><em>Testing</em></a> post instead, which replaces both <em>Unit Testing</em> and this post. The new posts implements similar functionality, but integrates it directly in <code>cargo xtest</code>, so that both unit and integration tests run in a realistic environment inside QEMU.</p> <h2 id="overview"><a class="zola-anchor" href="#overview" aria-label="Anchor link for: overview">🔗</a>Overview</h2> <p>In the previous post we added support for unit tests. The goal of unit tests is to test small components in isolation to ensure that each of them works as intended. The tests are run on the host machine and thus shouldn’t rely on architecture specific functionality.</p> <p>To test the interaction of the components, both with each other and the system environment, we can write <em>integration tests</em>. Compared to unit tests, ìntegration tests are more complex, because they need to run in a realistic environment. What this means depends on the application type. For example, for webserver applications it often means to set up a database instance. For an operating system kernel like ours, it means that we run the tests on the target hardware without an underlying operating system.</p> <p>Running on the target architecture allows us to test all hardware specific code such as the VGA buffer or the effects of <a href="https://en.wikipedia.org/wiki/Page_table">page table</a> modifications. It also allows us to verify that our kernel boots without problems and that no <a href="https://wiki.osdev.org/Exceptions">CPU exception</a> occurs.</p> <p>In this post we will implement a very basic test framework that runs integration tests inside instances of the <a href="https://www.qemu.org/">QEMU</a> virtual machine. It is not as realistic as running them on real hardware, but it is much simpler and should be sufficient as long as we only use standard hardware that is well supported in QEMU.</p> <h2 id="the-serial-port"><a class="zola-anchor" href="#the-serial-port" aria-label="Anchor link for: the-serial-port">🔗</a>The Serial Port</h2> <p>The naive way of doing an integration test would be to add some assertions in the code, launch QEMU, and manually check if a panic occurred or not. This is very cumbersome and not practical if we have hundreds of integration tests. So we want an automated solution that runs all tests and fails if not all of them pass.</p> <p>Such an automated test framework needs to know whether a test succeeded or failed. It can’t look at the screen output of QEMU, so we need a different way of retrieving the test results on the host system. A simple way to achieve this is by using the <a href="https://en.wikipedia.org/wiki/Serial_port">serial port</a>, an old interface standard which is no longer found in modern computers. It is easy to program and QEMU can redirect the bytes sent over serial to the host’s standard output or a file.</p> <p>The chips implementing a serial interface are called <a href="https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter">UARTs</a>. There are <a href="https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#UART_models">lots of UART models</a> on x86, but fortunately the only differences between them are some advanced features we don’t need. The common UARTs today are all compatible to the <a href="https://en.wikipedia.org/wiki/16550_UART">16550 UART</a>, so we will use that model for our testing framework.</p> <h3 id="port-i-o"><a class="zola-anchor" href="#port-i-o" aria-label="Anchor link for: port-i-o">🔗</a>Port I/O</h3> <p>There are two different approaches for communicating between the CPU and peripheral hardware on x86, <strong>memory-mapped I/O</strong> and <strong>port-mapped I/O</strong>. We already used memory-mapped I/O for accessing the <a href="https://os.phil-opp.com/vga-text-mode/">VGA text buffer</a> through the memory address <code>0xb8000</code>. This address is not mapped to RAM, but to some memory on the GPU.</p> <p>In contrast, port-mapped I/O uses a separate I/O bus for communication. Each connected peripheral has one or more port numbers. To communicate with such an I/O port there are special CPU instructions called <code>in</code> and <code>out</code>, which take a port number and a data byte (there are also variations of these commands that allow sending an <code>u16</code> or <code>u32</code>).</p> <p>The UART uses port-mapped I/O. Fortunately there are already several crates that provide abstractions for I/O ports and even UARTs, so we don’t need to invoke the <code>in</code> and <code>out</code> assembly instructions manually.</p> <h3 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h3> <p>We will use the <a href="https://docs.rs/uart_16550"><code>uart_16550</code></a> crate to initialize the UART and send data over the serial port. To add it as a dependency, we update our <code>Cargo.toml</code> and <code>main.rs</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">uart_16550 </span><span>= </span><span style="color:#d69d85;">&quot;0.1.0&quot; </span></code></pre> <p>The <code>uart_16550</code> crate contains a <code>SerialPort</code> struct that represents the UART registers, but we still need to construct an instance of it ourselves. For that we create a new <code>serial</code> module with the following content:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>serial; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/serial.rs </span><span> </span><span style="color:#569cd6;">use </span><span>uart_16550::SerialPort; </span><span style="color:#569cd6;">use </span><span>spin::Mutex; </span><span style="color:#569cd6;">use </span><span>lazy_static::lazy_static; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">pub static ref </span><span style="color:#b4cea8;">SERIAL1</span><span>: Mutex&lt;SerialPort&gt; = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> serial_port = SerialPort::new(</span><span style="color:#b5cea8;">0x3F8</span><span>); </span><span> serial_port.init(); </span><span> Mutex::new(serial_port) </span><span> }; </span><span>} </span></code></pre> <p>Like with the <a href="https://os.phil-opp.com/vga-text-mode/#lazy-statics">VGA text buffer</a>, we use <code>lazy_static</code> and a spinlock to create a <code>static</code>. However, this time we use <code>lazy_static</code> to ensure that the <code>init</code> method is called before first use. We’re using the port address <code>0x3F8</code>, which is the standard port number for the first serial interface.</p> <p>To make the serial port easily usable, we add <code>serial_print!</code> and <code>serial_println!</code> macros:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[doc(hidden)] </span><span style="color:#569cd6;">pub fn </span><span>_print(args: ::core::fmt::Arguments) { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#b4cea8;">SERIAL1</span><span>.lock().write_fmt(args).expect(</span><span style="color:#d69d85;">&quot;Printing to serial failed&quot;</span><span>); </span><span>} </span><span> </span><span style="color:#608b4e;">/// Prints to the host through the serial interface. </span><span>#[macro_export] </span><span>macro_rules! serial_print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> $crate::serial::_print(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*)); </span><span> }; </span><span>} </span><span> </span><span style="color:#608b4e;">/// Prints to the host through the serial interface, appending a newline. </span><span>#[macro_export] </span><span>macro_rules! serial_println { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>($crate::serial_print</span><span style="color:#569cd6;">!</span><span>(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>)); </span><span> ($fmt:</span><span style="color:#569cd6;">expr</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::serial_print</span><span style="color:#569cd6;">!</span><span>(concat!($fmt, </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>))); </span><span> ($fmt:</span><span style="color:#569cd6;">expr</span><span>, </span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::serial_print</span><span style="color:#569cd6;">!</span><span>( </span><span> concat!($fmt, </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>), </span><span style="color:#569cd6;">$</span><span>($arg)*)); </span><span>} </span></code></pre> <p>The <code>SerialPort</code> type already implements the <a href="https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html"><code>fmt::Write</code></a> trait, so we don’t need to provide an implementation.</p> <p>Now we can print to the serial interface in our <code>main.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>serial; </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span style="color:#608b4e;">// prints to vga buffer </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Hello Host{}&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>Note that the <code>serial_println</code> macro lives directly under the root namespace because we used the <code>#[macro_export]</code> attribute, so importing it through <code>use crate::serial::serial_println</code> will not work.</p> <h3 id="qemu-arguments"><a class="zola-anchor" href="#qemu-arguments" aria-label="Anchor link for: qemu-arguments">🔗</a>QEMU Arguments</h3> <p>To see the serial output in QEMU, we can use the <code>-serial</code> argument to redirect the output to stdout:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; qemu-system-x86_64 \ </span><span> -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin \ </span><span> -serial mon:stdio </span><span>warning: TCG doesn&#39;t support requested feature: CPUID.01H:ECX.vmx [bit 5] </span><span>Hello Host! </span></code></pre> <p>If you chose a different name than <code>blog_os</code>, you need to update the paths of course. Note that you can no longer exit QEMU through <code>Ctrl+c</code>. As an alternative you can use <code>Ctrl+a</code> and then <code>x</code>.</p> <p>As an alternative to this long command, we can pass the argument to <code>bootimage run</code>, with an additional <code>--</code> to separate the build arguments (passed to cargo) from the run arguments (passed to QEMU).</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>bootimage run -- -serial mon:stdio </span></code></pre> <p>Instead of standard output, QEMU supports <a href="https://qemu.weilnetz.de/doc/5.2/system/invocation.html#hxtool-9">many more target devices</a>. For redirecting the output to a file, the argument is:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>-serial file:output-file.txt </span></code></pre> <h2 id="shutting-down-qemu"><a class="zola-anchor" href="#shutting-down-qemu" aria-label="Anchor link for: shutting-down-qemu">🔗</a>Shutting Down QEMU</h2> <p>Right now we have an endless loop at the end of our <code>_start</code> function and need to close QEMU manually. This does not work for automated tests. We could try to kill QEMU automatically from the host, for example after some special output was sent over serial, but this would be a bit hacky and difficult to get right. The cleaner solution would be to implement a way to shutdown our OS. Unfortunately this is relatively complex, because it requires implementing support for either the <a href="https://wiki.osdev.org/APM">APM</a> or <a href="https://wiki.osdev.org/ACPI">ACPI</a> power management standard.</p> <p>Luckily, there is an escape hatch: QEMU supports a special <code>isa-debug-exit</code> device, which provides an easy way to exit QEMU from the guest system. To enable it, we add the following argument to our QEMU command:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>-device isa-debug-exit,iobase=0xf4,iosize=0x04 </span></code></pre> <p>The <code>iobase</code> specifies on which port address the device should live (<code>0xf4</code> is a <a href="https://wiki.osdev.org/I/O_Ports#The_list">generally unused</a> port on the x86’s IO bus) and the <code>iosize</code> specifies the port size (<code>0x04</code> means four bytes). Now the guest can write a value to the <code>0xf4</code> port and QEMU will exit with <a href="https://en.wikipedia.org/wiki/Exit_status">exit status</a> <code>(passed_value &lt;&lt; 1) | 1</code>.</p> <p>To write to the I/O port, we use the <a href="https://docs.rs/x86_64/0.5.2/x86_64/"><code>x86_64</code></a> crate:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">x86_64 </span><span>= </span><span style="color:#d69d85;">&quot;0.5.2&quot; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>exit_qemu() { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::&lt;</span><span style="color:#569cd6;">u32</span><span>&gt;::new(</span><span style="color:#b5cea8;">0xf4</span><span>); </span><span> port.write(</span><span style="color:#b5cea8;">0</span><span>); </span><span>} </span></code></pre> <p>We mark the function as <code>unsafe</code> because it relies on the fact that a special QEMU device is attached to the I/O port with address <code>0xf4</code>. For the port type we choose <code>u32</code> because the <code>iosize</code> is 4 bytes. As value we write a zero, which causes QEMU to exit with exit status <code>(0 &lt;&lt; 1) | 1 = 1</code>.</p> <p>Note that we could also use the exit status instead of the serial interface for sending the test results, for example <code>1</code> for success and <code>2</code> for failure. However, this wouldn’t allow us to send panic messages like the serial interface does and would also prevent us from replacing <code>exit_qemu</code> with a proper shutdown someday. Therefore we continue to use the serial interface and just always write a <code>0</code> to the port.</p> <p>We can now test the QEMU shutdown by calling <code>exit_qemu</code> from our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span style="color:#608b4e;">// prints to vga buffer </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Hello Host{}&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ exit_qemu(); } </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>You should see that QEMU immediately closes after booting when executing:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>bootimage run -- -serial mon:stdio -device isa-debug-exit,iobase=0xf4,iosize=0x04 </span></code></pre> <h2 id="hiding-qemu"><a class="zola-anchor" href="#hiding-qemu" aria-label="Anchor link for: hiding-qemu">🔗</a>Hiding QEMU</h2> <p>We are now able to launch a QEMU instance that writes its output to the serial port and automatically exits itself when it’s done. So we no longer need the VGA buffer output or the graphical representation that still pops up. We can disable it by passing the <code>-display none</code> parameter to QEMU. The full command looks like this:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>qemu-system-x86_64 \ </span><span> -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin \ </span><span> -serial mon:stdio \ </span><span> -device isa-debug-exit,iobase=0xf4,iosize=0x04 \ </span><span> -display none </span></code></pre> <p>Or, with <code>bootimage run</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>bootimage run -- \ </span><span> -serial mon:stdio \ </span><span> -device isa-debug-exit,iobase=0xf4,iosize=0x04 \ </span><span> -display none </span></code></pre> <p>Now QEMU runs completely in the background and no window is opened anymore. This is not only less annoying, but also allows our test framework to run in environments without a graphical user interface, such as <a href="https://travis-ci.com/">Travis CI</a>.</p> <h2 id="test-organization"><a class="zola-anchor" href="#test-organization" aria-label="Anchor link for: test-organization">🔗</a>Test Organization</h2> <p>Right now we’re doing the serial output and the QEMU exit from the <code>_start</code> function in our <code>main.rs</code> and can no longer run our kernel in a normal way. We could try to fix this by adding an <code>integration-test</code> <a href="https://doc.rust-lang.org/cargo/reference/features.html#the-features-section">cargo feature</a> and using <a href="https://doc.rust-lang.org/reference/conditional-compilation.html">conditional compilation</a>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">features</span><span>] </span><span style="color:#569cd6;">integration-test </span><span>= [] </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(not(feature </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;integration-test&quot;</span><span>))] </span><span style="color:#608b4e;">// new </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span style="color:#608b4e;">// prints to vga buffer </span><span> </span><span> </span><span style="color:#608b4e;">// normal execution </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span>#[cfg(feature </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;integration-test&quot;</span><span>)] </span><span style="color:#608b4e;">// new </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;Hello Host{}&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> run_test_1(); </span><span> run_test_2(); </span><span> </span><span style="color:#608b4e;">// run more tests </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ exit_qemu(); } </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>However, this approach has a big problem: All tests run in the same kernel instance, which means that they can influence each other. For example, if <code>run_test_1</code> misconfigures the system by loading an invalid <a href="https://en.wikipedia.org/wiki/Page_table">page table</a>, it can cause <code>run_test_2</code> to fail. This isn’t something that we want because it makes it very difficult to find the actual cause of an error.</p> <p>Instead, we want our test instances to be as independent as possible. If a test wants to destroy most of the system configuration to ensure that some property still holds in catastrophic situations, it should be able to do so without needing to restore a correct system state afterwards. This means that we need to launch a separate QEMU instance for each test.</p> <p>With the above conditional compilation we only have two modes: Run the kernel normally or execute <em>all</em> integration tests. To run each test in isolation we would need a separate cargo feature for each test with that approach, which would result in very complex conditional compilation bounds and confusing code.</p> <p>A better solution is to create an additional executable for each test.</p> <h3 id="additional-test-executables"><a class="zola-anchor" href="#additional-test-executables" aria-label="Anchor link for: additional-test-executables">🔗</a>Additional Test Executables</h3> <p>Cargo allows to add <a href="https://doc.rust-lang.org/cargo/guide/project-layout.html">additional executables</a> to a project by putting them inside <code>src/bin</code>. We can use that feature to create a separate executable for each integration test. For example, a <code>test-something</code> executable could be added like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/bin/test-something.rs </span><span> </span><span>#![cfg_attr(not(test), no_std)] </span><span>#![cfg_attr(not(test), no_main)] </span><span>#![cfg_attr(test, allow(unused_imports))] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// run tests </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span>#[cfg(not(test))] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>By providing a new implementation for <code>_start</code> we can create a minimal test case that only tests one specific thing and is independent of the rest. For example, if we don’t print anything to the VGA buffer, the test still succeeds even if the <code>vga_buffer</code> module is broken.</p> <p>We can now run this executable in QEMU by passing a <code>--bin</code> argument to <code>bootimage</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>bootimage run --bin test-something </span></code></pre> <p>It should build the <code>test-something.rs</code> executable instead of <code>main.rs</code> and launch an empty QEMU window (since we don’t print anything). So this approach allows us to create completely independent executables without cargo features or conditional compilation, and without cluttering our <code>main.rs</code>.</p> <p>However, there is a problem: This is a completely separate executable, which means that we can’t access any functions from our <code>main.rs</code>, including <code>serial_println</code> and <code>exit_qemu</code>. Duplicating the code would work, but we would also need to copy everything we want to test. This would mean that we no longer test the original function but only a possibly outdated copy.</p> <p>Fortunately there is a way to share most of the code between our <code>main.rs</code> and the testing binaries: We move most of the code from our <code>main.rs</code> to a library that we can include from all executables.</p> <h3 id="split-off-a-library"><a class="zola-anchor" href="#split-off-a-library" aria-label="Anchor link for: split-off-a-library">🔗</a>Split Off A Library</h3> <p>Cargo supports hybrid projects that are both a library and a binary. We only need to create a <code>src/lib.rs</code> file and split the contents of our <code>main.rs</code> in the following way:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/lib.rs </span><span> </span><span>#![cfg_attr(not(test), no_std)] </span><span style="color:#608b4e;">// don&#39;t link the Rust standard library </span><span> </span><span style="color:#608b4e;">// NEW: We need to add `pub` here to make them accessible from the outside </span><span style="color:#569cd6;">pub mod </span><span>vga_buffer; </span><span style="color:#569cd6;">pub mod </span><span>serial; </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>exit_qemu() { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::port::Port; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> port = Port::&lt;</span><span style="color:#569cd6;">u32</span><span>&gt;::new(</span><span style="color:#b5cea8;">0xf4</span><span>); </span><span> port.write(</span><span style="color:#b5cea8;">0</span><span>); </span><span>} </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/main.rs </span><span> </span><span>#![cfg_attr(not(test), no_std)] </span><span>#![cfg_attr(not(test), no_main)] </span><span>#![cfg_attr(test, allow(unused_imports))] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span style="color:#569cd6;">use </span><span>blog_os::println; </span><span> </span><span style="color:#608b4e;">/// This function is the entry point, since the linker looks for a function </span><span style="color:#608b4e;">/// named `_start` by default. </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#608b4e;">// don&#39;t mangle the name of this function </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[cfg(not(test))] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>So we move everything except <code>_start</code> and <code>panic</code> to <code>lib.rs</code> and make the <code>vga_buffer</code> and <code>serial</code> modules public. Everything should work exactly as before, including <code>bootimage run</code> and <code>cargo test</code>. To run tests only for the library part of our crate and avoid the additional output we can execute <code>cargo test --lib</code>.</p> <h3 id="test-basic-boot"><a class="zola-anchor" href="#test-basic-boot" aria-label="Anchor link for: test-basic-boot">🔗</a>Test Basic Boot</h3> <p>We are finally able to create our first integration test executable. We start simple and only test that the basic boot sequence works and the <code>_start</code> function is called:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/bin/test-basic-boot.rs </span><span> </span><span>#![cfg_attr(not(test), no_std)] </span><span>#![cfg_attr(not(test), no_main)] </span><span style="color:#608b4e;">// disable all Rust-level entry points </span><span>#![cfg_attr(test, allow(unused_imports))] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span style="color:#569cd6;">use </span><span>blog_os::{exit_qemu, serial_println}; </span><span> </span><span style="color:#608b4e;">/// This function is the entry point, since the linker looks for a function </span><span style="color:#608b4e;">/// named `_start` by default. </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#608b4e;">// don&#39;t mangle the name of this function </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;ok&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ exit_qemu(); } </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[cfg(not(test))] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;failed&quot;</span><span>); </span><span> </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;{}&quot;</span><span>, info); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ exit_qemu(); } </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We don’t do something special here, we just print <code>ok</code> if <code>_start</code> is called and <code>failed</code> with the panic message when a panic occurs. Let’s try it:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; bootimage run --bin test-basic-boot -- \ </span><span> -serial mon:stdio -display none \ </span><span> -device isa-debug-exit,iobase=0xf4,iosize=0x04 </span><span>Building kernel </span><span> Compiling blog_os v0.2.0 (file:///…/blog_os) </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.19s </span><span> Updating registry `https://github.com/rust-lang/crates.io-index` </span><span>Creating disk image at target/x86_64-blog_os/debug/bootimage-test-basic-boot.bin </span><span>warning: TCG doesn&#39;t support requested feature: CPUID.01H:ECX.vmx [bit 5] </span><span>ok </span></code></pre> <p>We got our <code>ok</code>, so it worked! Try inserting a <code>panic!()</code> before the <code>ok</code> printing, you should see output like this:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>failed </span><span>panicked at &#39;explicit panic&#39;, src/bin/test-basic-boot.rs:19:5 </span></code></pre> <h3 id="test-panic"><a class="zola-anchor" href="#test-panic" aria-label="Anchor link for: test-panic">🔗</a>Test Panic</h3> <p>To test that our panic handler is really invoked on a panic, we create a <code>test-panic</code> test:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/bin/test-panic.rs </span><span> </span><span>#![cfg_attr(not(test), no_std)] </span><span>#![cfg_attr(not(test), no_main)] </span><span>#![cfg_attr(test, allow(unused_imports))] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span style="color:#569cd6;">use </span><span>blog_os::{exit_qemu, serial_println}; </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> panic!(); </span><span>} </span><span> </span><span>#[cfg(not(test))] </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> serial_println!(</span><span style="color:#d69d85;">&quot;ok&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ exit_qemu(); } </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>This executable is almost identical to <code>test-basic-boot</code>, the only difference is that we print <code>ok</code> from our panic handler and invoke an explicit <code>panic()</code> in our <code>_start</code> function.</p> <h2 id="a-test-runner"><a class="zola-anchor" href="#a-test-runner" aria-label="Anchor link for: a-test-runner">🔗</a>A Test Runner</h2> <p>The final step is to create a test runner, a program that executes all integration tests and checks their results. The basic steps that it should do are:</p> <ul> <li>Look for integration tests in the current project, maybe by some convention (e.g. executables starting with <code>test-</code>).</li> <li>Run all integration tests and interpret their results. <ul> <li>Use a timeout to ensure that an endless loop does not block the test runner forever.</li> </ul> </li> <li>Report the test results to the user and set a successful or failing exit status.</li> </ul> <p>Such a test runner is useful to many projects, so we decided to add one to the <code>bootimage</code> tool.</p> <h3 id="bootimage-test"><a class="zola-anchor" href="#bootimage-test" aria-label="Anchor link for: bootimage-test">🔗</a>Bootimage Test</h3> <p>The test runner of the <code>bootimage</code> tool can be invoked via <code>bootimage test</code>. It uses the following conventions:</p> <ul> <li>All executables starting with <code>test-</code> are treated as integration tests.</li> <li>Tests must print either <code>ok</code> or <code>failed</code> over the serial port. When printing <code>failed</code> they can print additional information such as a panic message (in the next lines).</li> <li>Tests are run with a timeout of 1 minute. If the test has not completed in time, it is reported as “timed out”.</li> </ul> <p>The <code>test-basic-boot</code> and <code>test-panic</code> tests we created above begin with <code>test-</code> and follow the <code>ok</code>/<code>failed</code> conventions, so they should work with <code>bootimage test</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; bootimage test </span><span>test-panic </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.01s </span><span>Ok </span><span> </span><span>test-basic-boot </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.01s </span><span>Ok </span><span> </span><span>test-something </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.01s </span><span>Timed Out </span><span> </span><span>The following tests failed: </span><span> test-something: TimedOut </span></code></pre> <p>We see that our <code>test-panic</code> and <code>test-basic-boot</code> succeeded and that the <code>test-something</code> test timed out after one minute. We no longer need <code>test-something</code>, so we delete it (if you haven’t done already). Now <code>bootimage test</code> should execute successfully.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>In this post we learned about the serial port and port-mapped I/O and saw how to configure QEMU to print serial output to the command line. We also learned a trick how to exit QEMU without needing to implement a proper shutdown.</p> <p>We then split our crate into a library and binary part in order to create additional executables for integration tests. We added two example tests for testing that the <code>_start</code> function is correctly called and that a <code>panic</code> invokes our panic handler. Finally, we presented <code>bootimage test</code> as a basic test runner for our integration tests.</p> <p>We now have a working integration test framework and can finally start to implement functionality in our kernel. We will continue to use the test framework over the next posts to test new components we add.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>In the next post, we will explore <em>CPU exceptions</em>. These exceptions are thrown by the CPU when something illegal happens, such as a division by zero or an access to an unmapped memory page (a so-called “page fault”). Being able to catch and examine these exceptions is very important for debugging future errors. Exception handling is also very similar to the handling of hardware interrupts, which is required for keyboard support.</p> Unit Testing Sun, 29 Apr 2018 00:00:00 +0000 https://os.phil-opp.com/unit-testing/ https://os.phil-opp.com/unit-testing/ <p>This post explores unit testing in <code>no_std</code> executables using Rust’s built-in test framework. We will adjust our code so that <code>cargo test</code> works and add some basic unit tests to our VGA buffer module.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/unit-testing/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-04"><code>post-04</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="requirements"><a class="zola-anchor" href="#requirements" aria-label="Anchor link for: requirements">🔗</a>Requirements</h2> <p>In this post we explore how to execute <code>cargo test</code> on the host system (as a normal Linux/Windows/macOS executable). This only works if you don’t have a <code>.cargo/config</code> file that sets a default target. If you followed the <a href="https://os.phil-opp.com/minimal-rust-kernel/"><em>Minimal Rust Kernel</em></a> post before 2019-04-27, you should be fine. If you followed it after that date, you need to remove the <code>build.target</code> key from your <code>.cargo/config</code> file and explicitly pass a target argument to <code>cargo xbuild</code>.</p> <p>Alternatively, consider reading the new <a href="https://os.phil-opp.com/testing/"><em>Testing</em></a> post instead. It sets up a similar functionality as this post, but instead of running the tests on your host system, they are run in a realistic environment inside QEMU.</p> <h2 id="unit-tests-for-no-std-binaries"><a class="zola-anchor" href="#unit-tests-for-no-std-binaries" aria-label="Anchor link for: unit-tests-for-no-std-binaries">🔗</a>Unit Tests for <code>no_std</code> Binaries</h2> <p>Rust has a <a href="https://doc.rust-lang.org/book/ch11-00-testing.html">built-in test framework</a> that is capable of running unit tests without the need to set anything up. Just create a function that checks some results through assertions and add the <code>#[test]</code> attribute to the function header. Then <code>cargo test</code> will automatically find and execute all test functions of your crate.</p> <p>Unfortunately it’s a bit more complicated for <code>no_std</code> applications such as our kernel. If we run <code>cargo test</code> (without adding any test yet), we get the following error:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test </span><span> Compiling blog_os v0.2.0 (file:///…/blog_os) </span><span>error[E0152]: duplicate lang item found: `panic_impl`. </span><span> --&gt; src/main.rs:35:1 </span><span> | </span><span>35 | / fn panic(info: &amp;PanicInfo) -&gt; ! { </span><span>36 | | println!(&quot;{}&quot;, info); </span><span>37 | | loop {} </span><span>38 | | } </span><span> | |_^ </span><span> | </span><span> = note: first defined in crate `std`. </span></code></pre> <p>The problem is that unit tests are built for the host machine, with the <code>std</code> library included. This makes sense because they should be able to run as a normal application on the host operating system. Since the standard library has it’s own <code>panic_handler</code> function, we get the above error. To fix it, we use <a href="https://doc.rust-lang.org/reference/conditional-compilation.html">conditional compilation</a> to include our implementation of the panic handler only in non-test environments:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span>#[cfg(not(test))] </span><span style="color:#608b4e;">// only compile when the test flag is not set </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>The only change is the added <code>#[cfg(not(test))]</code> attribute. The <code>#[cfg(…)]</code> attribute ensures that the annotated item is only included if the passed condition is met. The <code>test</code> configuration is set when the crate is compiled for unit tests. Through <code>not(…)</code> we negate the condition so that the language item is only compiled for non-test builds.</p> <p>When we now try <code>cargo test</code> again, we get an ugly linker error:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: linking with `cc` failed: exit code: 1 </span><span> | </span><span> = note: &quot;cc&quot; &quot;-Wl,--as-needed&quot; &quot;-Wl,-z,noexecstack&quot; &quot;-m64&quot; &quot;-L&quot; &quot;/…/lib/rustlib/x86_64-unknown-linux-gnu/lib&quot; […] </span><span> = note: /…/blog_os-969bdb90d27730ed.2q644ojj2xqxddld.rcgu.o: In function `_start&#39;: </span><span> /…/blog_os/src/main.rs:17: multiple definition of `_start&#39; </span><span> /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here </span><span> /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/Scrt1.o: In function `_start&#39;: </span><span> (.text+0x20): undefined reference to `main&#39; </span><span> collect2: error: ld returned 1 exit status </span><span> </span></code></pre> <p>I shortened the output here because it is extremely verbose. The relevant part is at the bottom, after the second “note:”. We got two distinct errors here, “<em>multiple definition of <code>_start</code></em>” and “<em>undefined reference to <code>main</code></em>”.</p> <p>The reason for the first error is that the test framework injects its own <code>main</code> and <code>_start</code> functions, which will run the tests when invoked. So we get two functions named <code>_start</code> when compiling in test mode, one from the test framework and the one we defined ourselves. To fix this, we need to exclude our <code>_start</code> function in that case, which we can do by marking it as <code>#[cfg(not(test))]</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[cfg(not(test))] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ … } </span></code></pre> <p>The second problem is that we use the <code>#![no_main]</code> attribute for our crate, which suppresses any <code>main</code> generation, including the test <code>main</code>. To solve this, we use the <a href="https://chrismorgan.info/blog/rust-cfg_attr.html"><code>cfg_attr</code></a> attribute to conditionally enable the <code>no_main</code> attribute only in non-test mode:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#![cfg_attr(not(test), no_main)] </span><span style="color:#608b4e;">// instead of `#![no_main]` </span></code></pre> <p>Now <code>cargo test</code> works:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test </span><span> Compiling blog_os v0.2.0 (file:///…/blog_os) </span><span> [some warnings] </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.98 secs </span><span> Running target/debug/deps/blog_os-1f08396a9eff0aa7 </span><span> </span><span>running 0 tests </span><span> </span><span>test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out </span></code></pre> <p>The test framework seems to work as intended. We don’t have any tests yet, but we already get a test result summary.</p> <h3 id="silencing-the-warnings"><a class="zola-anchor" href="#silencing-the-warnings" aria-label="Anchor link for: silencing-the-warnings">🔗</a>Silencing the Warnings</h3> <p>We get a few warnings about unused imports, because we no longer compile our <code>_start</code> function. To silence such unused code warnings, we can add the following to the top of our <code>main.rs</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>#![cfg_attr(test, allow(unused_imports))] </span></code></pre> <p>Like before, the <code>cfg_attr</code> attribute sets the passed attribute if the passed condition holds. Here, we set the <code>allow(…)</code> attribute when compiling in test mode. We use the <code>allow</code> attribute to disable warnings for the <code>unused_import</code> <em>lint</em>.</p> <p>Lints are classes of warnings, for example <code>dead_code</code> for unused code or <code>missing-docs</code> for missing documentation. Lints can be set to four different states:</p> <ul> <li><code>allow</code>: no errors, no warnings</li> <li><code>warn</code>: causes a warning</li> <li><code>deny</code>: causes a compilation error</li> <li><code>forbid</code>: like <code>deny</code>, but can’t be overridden</li> </ul> <p>Some lints are <code>allow</code> by default (such as <code>missing-docs</code>), others are <code>warn</code> by default (such as <code>dead_code</code>), and some few are even <code>deny</code> by default.. The default can be overridden by the <code>allow</code>, <code>warn</code>, <code>deny</code> and <code>forbid</code> attributes. For a list of all lints, see <code>rustc -W help</code>. There is also the <a href="https://github.com/rust-lang-nursery/rust-clippy">clippy</a> project, which provides many additional lints.</p> <h3 id="including-the-standard-library"><a class="zola-anchor" href="#including-the-standard-library" aria-label="Anchor link for: including-the-standard-library">🔗</a>Including the Standard Library</h3> <p>Unit tests run on the host machine, so it’s possible to use the complete standard library inside them. To link the standard library in test mode, we can make the <code>#![no_std]</code> attribute conditional through <code>cfg_attr</code> too:</p> <pre data-lang="diff" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-diff "><code class="language-diff" data-lang="diff"><span>-#![no_std] </span><span>+#![cfg_attr(not(test), no_std)] </span></code></pre> <h2 id="testing-the-vga-module"><a class="zola-anchor" href="#testing-the-vga-module" aria-label="Anchor link for: testing-the-vga-module">🔗</a>Testing the VGA Module</h2> <p>Now that we have set up the test framework, we can add a first unit test for our <code>vga_buffer</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[cfg(test)] </span><span style="color:#569cd6;">mod </span><span>test { </span><span> </span><span style="color:#569cd6;">use super</span><span>::*; </span><span> </span><span> #[test] </span><span> </span><span style="color:#569cd6;">fn </span><span>foo() {} </span><span>} </span></code></pre> <p>We add the test in an inline <code>test</code> submodule. This isn’t necessary, but a common way to separate test code from the rest of the module. By adding the <code>#[cfg(test)]</code> attribute, we ensure that the module is only compiled in test mode. Through <code>use super::*</code>, we import all items of the parent module (the <code>vga_buffer</code> module), so that we can test them easily.</p> <p>The <code>#[test]</code> attribute on the <code>foo</code> function tells the test framework that the function is an unit test. The framework will find it automatically, even if it’s private and inside a private module as in our case:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo test </span><span> Compiling blog_os v0.2.0 (file:///…/blog_os) </span><span> Finished dev [unoptimized + debuginfo] target(s) in 2.99 secs </span><span> Running target/debug/deps/blog_os-1f08396a9eff0aa7 </span><span> </span><span>running 1 test </span><span>test vga_buffer::test::foo ... ok </span><span> </span><span>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out </span></code></pre> <p>We see that the test was found and executed. It didn’t panic, so it counts as passed.</p> <h3 id="constructing-a-writer"><a class="zola-anchor" href="#constructing-a-writer" aria-label="Anchor link for: constructing-a-writer">🔗</a>Constructing a Writer</h3> <p>In order to test the VGA methods, we first need to construct a <code>Writer</code> instance. Since we will need such an instance for other tests too, we create a separate function for it:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[cfg(test)] </span><span style="color:#569cd6;">mod </span><span>test { </span><span> </span><span style="color:#569cd6;">use super</span><span>::*; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>construct_writer() -&gt; Writer { </span><span> </span><span style="color:#569cd6;">use </span><span>std::boxed::Box; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> buffer = construct_buffer(); </span><span> Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::Blue, Color::Magenta), </span><span> buffer: Box::leak(Box::new(buffer)), </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>construct_buffer() -&gt; Buffer { … } </span><span>} </span></code></pre> <p>We set the initial column position to 0 and choose some arbitrary colors for foreground and background color. The difficult part is the buffer construction, it’s described in detail below. We then use <a href="https://doc.rust-lang.org/nightly/std/boxed/struct.Box.html#method.new"><code>Box::new</code></a> and <a href="https://doc.rust-lang.org/nightly/std/boxed/struct.Box.html#method.leak"><code>Box::leak</code></a> to transform the created <code>Buffer</code> into a <code>&amp;'static mut Buffer</code>, because the <code>buffer</code> field needs to be of that type.</p> <h4 id="buffer-construction"><a class="zola-anchor" href="#buffer-construction" aria-label="Anchor link for: buffer-construction">🔗</a>Buffer Construction</h4> <p>So how do we create a <code>Buffer</code> instance? The naive approach does not work unfortunately:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>construct_buffer() -&gt; Buffer { </span><span> Buffer { </span><span> chars: [[Volatile::new(empty_char()); </span><span style="color:#b4cea8;">BUFFER_WIDTH</span><span>]; </span><span style="color:#b4cea8;">BUFFER_HEIGHT</span><span>], </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>empty_char() -&gt; ScreenChar { </span><span> ScreenChar { </span><span> ascii_character: </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39; &#39;</span><span>, </span><span> color_code: ColorCode::new(Color::Green, Color::Brown), </span><span> } </span><span>} </span></code></pre> <p>When running <code>cargo test</code> the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0277]: the trait bound `volatile::Volatile&lt;vga_buffer::ScreenChar&gt;: core::marker::Copy` is not satisfied </span><span> --&gt; src/vga_buffer.rs:186:21 </span><span> | </span><span>186 | chars: [[Volatile::new(empty_char); BUFFER_WIDTH]; BUFFER_HEIGHT], </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `core::marker::Copy` is not implemented for `volatile::Volatile&lt;vga_buffer::ScreenChar&gt;` </span><span> | </span><span> = note: the `Copy` trait is required because the repeated element will be copied </span></code></pre> <p>The problem is that array construction in Rust requires that the contained type is <a href="https://doc.rust-lang.org/core/marker/trait.Copy.html"><code>Copy</code></a>. The <code>ScreenChar</code> is <code>Copy</code>, but the <code>Volatile</code> wrapper is not. There is currently no easy way to circumvent this without using <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html"><code>unsafe</code></a>, but fortunately there is the <a href="https://docs.rs/array-init"><code>array_init</code></a> crate that provides a safe interface for such operations.</p> <p>To use that crate, we add the following to our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">dev-dependencies</span><span>] </span><span style="color:#569cd6;">array-init </span><span>= </span><span style="color:#d69d85;">&quot;0.0.3&quot; </span></code></pre> <p>Note that we’re using the <a href="https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#development-dependencies"><code>dev-dependencies</code></a> table instead of the <code>dependencies</code> table, because we only need the crate for <code>cargo test</code> and not for a normal build.</p> <p>Now we can fix our <code>construct_buffer</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>construct_buffer() -&gt; Buffer { </span><span> </span><span style="color:#569cd6;">use </span><span>array_init::array_init; </span><span> </span><span> Buffer { </span><span> chars: array_init(|_| array_init(|_| Volatile::new(empty_char()))), </span><span> } </span><span>} </span></code></pre> <p>See the <a href="https://docs.rs/array-init">documentation of <code>array_init</code></a> for more information about using that crate.</p> <h3 id="testing-write-byte"><a class="zola-anchor" href="#testing-write-byte" aria-label="Anchor link for: testing-write-byte">🔗</a>Testing <code>write_byte</code></h3> <p>Now we’re finally able to write a first unit test that tests the <code>write_byte</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in vga_buffer.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>test { </span><span> […] </span><span> </span><span> #[test] </span><span> </span><span style="color:#569cd6;">fn </span><span>write_byte() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = construct_writer(); </span><span> writer.write_byte(</span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;X&#39;</span><span>); </span><span> writer.write_byte(</span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;Y&#39;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">for </span><span>(i, row) </span><span style="color:#569cd6;">in</span><span> writer.buffer.chars.iter().enumerate() { </span><span> </span><span style="color:#569cd6;">for </span><span>(j, screen_char) </span><span style="color:#569cd6;">in</span><span> row.iter().enumerate() { </span><span> </span><span style="color:#569cd6;">let</span><span> screen_char = screen_char.read(); </span><span> </span><span style="color:#569cd6;">if</span><span> i == </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">1 </span><span style="color:#569cd6;">&amp;&amp;</span><span> j == </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> assert_eq!(screen_char.ascii_character, </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;X&#39;</span><span>); </span><span> assert_eq!(screen_char.color_code, writer.color_code); </span><span> } </span><span style="color:#569cd6;">else if</span><span> i == </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">1 </span><span style="color:#569cd6;">&amp;&amp;</span><span> j == </span><span style="color:#b5cea8;">1 </span><span>{ </span><span> assert_eq!(screen_char.ascii_character, </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;Y&#39;</span><span>); </span><span> assert_eq!(screen_char.color_code, writer.color_code); </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> assert_eq!(screen_char, empty_char()); </span><span> } </span><span> } </span><span> } </span><span> } </span><span>} </span></code></pre> <p>We construct a <code>Writer</code>, write two bytes to it, and then check that the right screen characters were updated. When we run <code>cargo test</code>, we see that the test is executed and passes:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>running 1 test </span><span>test vga_buffer::test::write_byte ... ok </span><span> </span><span>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out </span></code></pre> <p>Try to play around a bit with this function and verify that the test fails if you change something, e.g. if you print a third byte without adjusting the <code>for</code> loop.</p> <p>(If you’re getting an “binary operation <code>==</code> cannot be applied to type <code>vga_buffer::ScreenChar</code>” error, you need to also derive <a href="https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html"><code>PartialEq</code></a> for <code>ScreenChar</code> and <code>ColorCode</code>).</p> <h3 id="testing-strings"><a class="zola-anchor" href="#testing-strings" aria-label="Anchor link for: testing-strings">🔗</a>Testing Strings</h3> <p>Let’s add a second unit test to test formatted output and newline behavior:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>test { </span><span> […] </span><span> </span><span> #[test] </span><span> </span><span style="color:#569cd6;">fn </span><span>write_formatted() { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = construct_writer(); </span><span> writeln!(&amp;mut writer, </span><span style="color:#d69d85;">&quot;a&quot;</span><span>).unwrap(); </span><span> writeln!(&amp;mut writer, </span><span style="color:#d69d85;">&quot;b</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;c&quot;</span><span>).unwrap(); </span><span> </span><span> </span><span style="color:#569cd6;">for </span><span>(i, row) </span><span style="color:#569cd6;">in</span><span> writer.buffer.chars.iter().enumerate() { </span><span> </span><span style="color:#569cd6;">for </span><span>(j, screen_char) </span><span style="color:#569cd6;">in</span><span> row.iter().enumerate() { </span><span> </span><span style="color:#569cd6;">let</span><span> screen_char = screen_char.read(); </span><span> </span><span style="color:#569cd6;">if</span><span> i == </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">3 </span><span style="color:#569cd6;">&amp;&amp;</span><span> j == </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> assert_eq!(screen_char.ascii_character, </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;a&#39;</span><span>); </span><span> assert_eq!(screen_char.color_code, writer.color_code); </span><span> } </span><span style="color:#569cd6;">else if</span><span> i == </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">2 </span><span style="color:#569cd6;">&amp;&amp;</span><span> j == </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> assert_eq!(screen_char.ascii_character, </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;b&#39;</span><span>); </span><span> assert_eq!(screen_char.color_code, writer.color_code); </span><span> } </span><span style="color:#569cd6;">else if</span><span> i == </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">2 </span><span style="color:#569cd6;">&amp;&amp;</span><span> j == </span><span style="color:#b5cea8;">1 </span><span>{ </span><span> assert_eq!(screen_char.ascii_character, </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;c&#39;</span><span>); </span><span> assert_eq!(screen_char.color_code, writer.color_code); </span><span> } </span><span style="color:#569cd6;">else if</span><span> i &gt;= </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">2 </span><span>{ </span><span> assert_eq!(screen_char.ascii_character, </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39; &#39;</span><span>); </span><span> assert_eq!(screen_char.color_code, writer.color_code); </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> assert_eq!(screen_char, empty_char()); </span><span> } </span><span> } </span><span> } </span><span> } </span><span>} </span></code></pre> <p>In this test we’re using the <a href="https://doc.rust-lang.org/nightly/core/macro.writeln.html"><code>writeln!</code></a> macro to print strings with newlines to the buffer. Most of the for loop is similar to the <code>write_byte</code> test and only verifies if the written characters are at the expected place. The new <code>if i &gt;= BUFFER_HEIGHT - 2</code> case verifies that the empty lines that are shifted in on a newline have the <code>writer.color_code</code>, which is different from the initial color.</p> <h3 id="more-tests"><a class="zola-anchor" href="#more-tests" aria-label="Anchor link for: more-tests">🔗</a>More Tests</h3> <p>We only present two basic tests here as an example, but of course many more tests are possible. For example a test that changes the writer color in between writes. Or a test that checks that the top line is correctly shifted off the screen on a newline. Or a test that checks that non-ASCII characters are handled correctly.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>Unit testing is a very useful technique to ensure that certain components have a desired behavior. Even if they cannot show the absence of bugs, they’re still an useful tool for finding them and especially for avoiding regressions.</p> <p>This post explained how to set up unit testing in a Rust kernel. We now have a functioning test framework and can easily add tests by adding functions with a <code>#[test]</code> attribute. To run them, a short <code>cargo test</code> suffices. We also added a few basic tests for our VGA buffer as an example how unit tests could look like.</p> <p>We also learned a bit about conditional compilation, Rust’s <a href="https://os.phil-opp.com/unit-testing/#silencing-the-warnings">lint system</a>, how to <a href="https://os.phil-opp.com/unit-testing/#buffer-construction">initialize arrays with non-Copy types</a>, and the <code>dev-dependencies</code> section of the <code>Cargo.toml</code>.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>We now have a working unit testing framework, which gives us the ability to test individual components. However, unit tests have the disadvantage that they run on the host machine and are thus unable to test how components interact with platform specific parts. For example, we can’t test the <code>println!</code> macro with an unit test because it wants to write at the VGA text buffer at address <code>0xb8000</code>, which only exists in the bare metal environment.</p> <p>The next post will close this gap by creating a basic <em>integration test</em> framework, which runs the tests in QEMU and thus has access to platform specific components. This will allow us to test the full system, for example that our kernel boots correctly or that no deadlock occurs on nested <code>println!</code> invocations.</p> Writing an OS in pure Rust Fri, 09 Mar 2018 00:00:00 +0000 https://os.phil-opp.com/news/pure-rust/ https://os.phil-opp.com/news/pure-rust/ <p>Over the past six months we’ve been working on a second edition of this blog. Our goals for this new version are <a href="https://github.com/phil-opp/blog_os/issues/360">numerous</a> and we are still not done yet, but today we reached a major milestone: It is now possible to build the OS natively on Windows, macOS, and Linux <strong>without any non-Rust dependendencies</strong>.</p> <span id="continue-reading"></span> <p>The <a href="https://os.phil-opp.com/edition-1/">first edition</a> required several C-tools for building:</p> <ul> <li>We used the <a href="https://www.gnu.org/software/grub/"><code>GRUB</code></a> bootloader for booting our kernel. To create a bootable disk/CD image we used the <a href="https://www.gnu.org/software/grub/manual/grub/html_node/Invoking-grub_002dmkrescue.html"><code>grub-mkrescue</code></a> tool, which is very difficult to get to run on Windows.</li> <li>The <a href="https://www.gnu.org/software/xorriso/"><code>xorriso</code></a> program was also required, because it is used by <code>grub-mkrescue</code>.</li> <li>GRUB only boots to protected mode, so we needed some assembly code for <a href="https://os.phil-opp.com/entering-longmode/">entering long mode</a>. For building the assembly code, we used the <a href="https://www.nasm.us/xdoc/2.13.03/html/nasmdoc1.html"><code>nasm</code></a> assembler.</li> <li>We used the GNU linker <a href="https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html"><code>ld</code></a> for linking together the assembly files with the rust code, using a custom <a href="https://sourceware.org/binutils/docs/ld/Scripts.html">linker script</a>.</li> <li>Finally, we used <a href="https://www.gnu.org/software/make/"><code>make</code></a> for automating the various build steps (assembling, compiling the Rust code, linking, invoking <code>grub-mkrescue</code>).</li> </ul> <p>We got lots of feedback that this setup was difficult to get running <a href="https://github.com/phil-opp/blog_os/issues/55">under macOS</a> and Windows. As a workaround, we <a href="https://github.com/phil-opp/blog_os/pull/373">added support for docker</a>, but that still required users to install and understand an additional dependency. So when we decided to create a second edition of the blog - originally because the order of posts led to jumps in difficulty - we thought about how we could avoid these C-dependencies.</p> <p>There are lots of alternatives to <code>make</code>, including some Rust tools such as <a href="https://github.com/casey/just">just</a> and <a href="https://sagiegurari.github.io/cargo-make/">cargo-make</a>. Avoiding <code>nasm</code> is also possible by using Rust’s <a href="https://doc.rust-lang.org/stable/core/arch/macro.global_asm.html"><code>global_asm</code></a> feature instead. So there are only two problems left: the bootloader and the linker.</p> <h2 id="a-custom-bootloader">A custom Bootloader</h2> <p>To avoid the dependency on GRUB and to make things more ergonomic, we decided to write <a href="https://github.com/rust-osdev/bootloader">our own bootloader</a> using Rust’s <a href="https://doc.rust-lang.org/stable/core/arch/macro.global_asm.html"><code>global_asm</code></a> feature. This way, the kernel can be significantly simplified, since the switch to long mode and the initial page table layout can already be done in the bootloader. Thus, we can avoid the initial assembly level blog posts in the second edition and directly start with high level Rust code.</p> <p>The bootloader is still an early prototype, but it is already capable of switching to long mode and loading the kernel in form of an 64-bit ELF binary. It also performs the correct page table mapping (with the correct read/write/execute permissions) as it’s specified in the ELF file and creates an initial physical memory map.</p> <p>The plan for the future is to make the bootloader more stable, add documentation, and ultimately add a “Writing a Bootloader” series to the blog, which explains in detail how the bootloader works.</p> <h2 id="linking-with-lld">Linking with LLD</h2> <p>With our custom bootloader in place, the last remaining problem is platform independent linking. Fortunately there is <a href="https://lld.llvm.org/"><code>LLD</code></a>, the cross-platform linker from the LLVM project, which is already very stable for the <code>x86</code> architecture. As a bonus, <code>LLD</code> is <a href="https://github.com/rust-lang/rust/pull/48125">now shipped with Rust</a>, which means that it can be used without any extra installation.</p> <h2 id="the-new-posts">The new Posts</h2> <p>The second edition is already live at <a href="https://os.phil-opp.com/edition-2/">https://os.phil-opp.com/second-edition</a>. Please tell us if you have any feedback on the new posts! We’re planning to move over the content from the <a href="https://os.phil-opp.com/edition-1/">first edition</a> iteratively, in a different order and with various other improvements.</p> <p>Many thanks to everyone who helped to make Rust an even better language for OS development!</p> VGA Text Mode Mon, 26 Feb 2018 00:00:00 +0000 https://os.phil-opp.com/vga-text-mode/ https://os.phil-opp.com/vga-text-mode/ <p>The <a href="https://en.wikipedia.org/wiki/VGA-compatible_text_mode">VGA text mode</a> is a simple way to print text to the screen. In this post, we create an interface that makes its usage safe and simple by encapsulating all unsafety in a separate module. We also implement support for Rust’s <a href="https://doc.rust-lang.org/std/fmt/#related-macros">formatting macros</a>.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/vga-text-mode/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-03"><code>post-03</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="the-vga-text-buffer"><a class="zola-anchor" href="#the-vga-text-buffer" aria-label="Anchor link for: the-vga-text-buffer">🔗</a>The VGA Text Buffer</h2> <p>To print a character to the screen in VGA text mode, one has to write it to the text buffer of the VGA hardware. The VGA text buffer is a two-dimensional array with typically 25 rows and 80 columns, which is directly rendered to the screen. Each array entry describes a single screen character through the following format:</p> <table><thead><tr><th>Bit(s)</th><th>Value</th></tr></thead><tbody> <tr><td>0-7</td><td>ASCII code point</td></tr> <tr><td>8-11</td><td>Foreground color</td></tr> <tr><td>12-14</td><td>Background color</td></tr> <tr><td>15</td><td>Blink</td></tr> </tbody></table> <p>The first byte represents the character that should be printed in the <a href="https://en.wikipedia.org/wiki/ASCII">ASCII encoding</a>. To be more specific, it isn’t exactly ASCII, but a character set named <a href="https://en.wikipedia.org/wiki/Code_page_437"><em>code page 437</em></a> with some additional characters and slight modifications. For simplicity, we will proceed to call it an ASCII character in this post.</p> <p>The second byte defines how the character is displayed. The first four bits define the foreground color, the next three bits the background color, and the last bit whether the character should blink. The following colors are available:</p> <table><thead><tr><th>Number</th><th>Color</th><th>Number + Bright Bit</th><th>Bright Color</th></tr></thead><tbody> <tr><td>0x0</td><td>Black</td><td>0x8</td><td>Dark Gray</td></tr> <tr><td>0x1</td><td>Blue</td><td>0x9</td><td>Light Blue</td></tr> <tr><td>0x2</td><td>Green</td><td>0xa</td><td>Light Green</td></tr> <tr><td>0x3</td><td>Cyan</td><td>0xb</td><td>Light Cyan</td></tr> <tr><td>0x4</td><td>Red</td><td>0xc</td><td>Light Red</td></tr> <tr><td>0x5</td><td>Magenta</td><td>0xd</td><td>Pink</td></tr> <tr><td>0x6</td><td>Brown</td><td>0xe</td><td>Yellow</td></tr> <tr><td>0x7</td><td>Light Gray</td><td>0xf</td><td>White</td></tr> </tbody></table> <p>Bit 4 is the <em>bright bit</em>, which turns, for example, blue into light blue. For the background color, this bit is repurposed as the blink bit.</p> <p>The VGA text buffer is accessible via <a href="https://en.wikipedia.org/wiki/Memory-mapped_I/O">memory-mapped I/O</a> to the address <code>0xb8000</code>. This means that reads and writes to that address don’t access the RAM but directly access the text buffer on the VGA hardware. This means we can read and write it through normal memory operations to that address.</p> <p>Note that memory-mapped hardware might not support all normal RAM operations. For example, a device could only support byte-wise reads and return junk when a <code>u64</code> is read. Fortunately, the text buffer <a href="https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip">supports normal reads and writes</a>, so we don’t have to treat it in a special way.</p> <h2 id="a-rust-module"><a class="zola-anchor" href="#a-rust-module" aria-label="Anchor link for: a-rust-module">🔗</a>A Rust Module</h2> <p>Now that we know how the VGA buffer works, we can create a Rust module to handle printing:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span style="color:#569cd6;">mod </span><span>vga_buffer; </span></code></pre> <p>For the content of this module, we create a new <code>src/vga_buffer.rs</code> file. All of the code below goes into our new module (unless specified otherwise).</p> <h3 id="colors"><a class="zola-anchor" href="#colors" aria-label="Anchor link for: colors">🔗</a>Colors</h3> <p>First, we represent the different colors using an enum:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[allow(dead_code)] </span><span>#[derive(Debug, Clone, Copy, PartialEq, Eq)] </span><span>#[repr(u8)] </span><span style="color:#569cd6;">pub enum </span><span>Color { </span><span> Black = </span><span style="color:#b5cea8;">0</span><span>, </span><span> Blue = </span><span style="color:#b5cea8;">1</span><span>, </span><span> Green = </span><span style="color:#b5cea8;">2</span><span>, </span><span> Cyan = </span><span style="color:#b5cea8;">3</span><span>, </span><span> Red = </span><span style="color:#b5cea8;">4</span><span>, </span><span> Magenta = </span><span style="color:#b5cea8;">5</span><span>, </span><span> Brown = </span><span style="color:#b5cea8;">6</span><span>, </span><span> LightGray = </span><span style="color:#b5cea8;">7</span><span>, </span><span> DarkGray = </span><span style="color:#b5cea8;">8</span><span>, </span><span> LightBlue = </span><span style="color:#b5cea8;">9</span><span>, </span><span> LightGreen = </span><span style="color:#b5cea8;">10</span><span>, </span><span> LightCyan = </span><span style="color:#b5cea8;">11</span><span>, </span><span> LightRed = </span><span style="color:#b5cea8;">12</span><span>, </span><span> Pink = </span><span style="color:#b5cea8;">13</span><span>, </span><span> Yellow = </span><span style="color:#b5cea8;">14</span><span>, </span><span> White = </span><span style="color:#b5cea8;">15</span><span>, </span><span>} </span></code></pre> <p>We use a <a href="https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html">C-like enum</a> here to explicitly specify the number for each color. Because of the <code>repr(u8)</code> attribute, each enum variant is stored as a <code>u8</code>. Actually 4 bits would be sufficient, but Rust doesn’t have a <code>u4</code> type.</p> <p>Normally the compiler would issue a warning for each unused variant. By using the <code>#[allow(dead_code)]</code> attribute, we disable these warnings for the <code>Color</code> enum.</p> <p>By <a href="https://doc.rust-lang.org/rust-by-example/trait/derive.html">deriving</a> the <a href="https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html"><code>Copy</code></a>, <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html"><code>Clone</code></a>, <a href="https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html"><code>Debug</code></a>, <a href="https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html"><code>PartialEq</code></a>, and <a href="https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html"><code>Eq</code></a> traits, we enable <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/ownership.html#copy-types">copy semantics</a> for the type and make it printable and comparable.</p> <p>To represent a full color code that specifies foreground and background color, we create a <a href="https://doc.rust-lang.org/rust-by-example/generics/new_types.html">newtype</a> on top of <code>u8</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[derive(Debug, Clone, Copy, PartialEq, Eq)] </span><span>#[repr(transparent)] </span><span style="color:#569cd6;">struct </span><span>ColorCode(</span><span style="color:#569cd6;">u8</span><span>); </span><span> </span><span style="color:#569cd6;">impl </span><span>ColorCode { </span><span> </span><span style="color:#569cd6;">fn </span><span>new(foreground: Color, background: Color) -&gt; ColorCode { </span><span> ColorCode((background </span><span style="color:#569cd6;">as u8</span><span>) &lt;&lt; </span><span style="color:#b5cea8;">4 </span><span style="color:#569cd6;">| </span><span>(foreground </span><span style="color:#569cd6;">as u8</span><span>)) </span><span> } </span><span>} </span></code></pre> <p>The <code>ColorCode</code> struct contains the full color byte, containing foreground and background color. Like before, we derive the <code>Copy</code> and <code>Debug</code> traits for it. To ensure that the <code>ColorCode</code> has the exact same data layout as a <code>u8</code>, we use the <a href="https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent"><code>repr(transparent)</code></a> attribute.</p> <h3 id="text-buffer"><a class="zola-anchor" href="#text-buffer" aria-label="Anchor link for: text-buffer">🔗</a>Text Buffer</h3> <p>Now we can add structures to represent a screen character and the text buffer:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[derive(Debug, Clone, Copy, PartialEq, Eq)] </span><span>#[repr(C)] </span><span style="color:#569cd6;">struct </span><span>ScreenChar { </span><span> ascii_character: </span><span style="color:#569cd6;">u8</span><span>, </span><span> color_code: ColorCode, </span><span>} </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">BUFFER_HEIGHT</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">25</span><span>; </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">BUFFER_WIDTH</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">80</span><span>; </span><span> </span><span>#[repr(transparent)] </span><span style="color:#569cd6;">struct </span><span>Buffer { </span><span> chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], </span><span>} </span></code></pre> <p>Since the field ordering in default structs is undefined in Rust, we need the <a href="https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc"><code>repr(C)</code></a> attribute. It guarantees that the struct’s fields are laid out exactly like in a C struct and thus guarantees the correct field ordering. For the <code>Buffer</code> struct, we use <a href="https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent"><code>repr(transparent)</code></a> again to ensure that it has the same memory layout as its single field.</p> <p>To actually write to screen, we now create a writer type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Writer { </span><span> column_position: </span><span style="color:#569cd6;">usize</span><span>, </span><span> color_code: ColorCode, </span><span> buffer: </span><span style="color:#569cd6;">&amp;&#39;static mut</span><span> Buffer, </span><span>} </span></code></pre> <p>The writer will always write to the last line and shift lines up when a line is full (or on <code>\n</code>). The <code>column_position</code> field keeps track of the current position in the last row. The current foreground and background colors are specified by <code>color_code</code> and a reference to the VGA buffer is stored in <code>buffer</code>. Note that we need an <a href="https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax">explicit lifetime</a> here to tell the compiler how long the reference is valid. The <a href="https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime"><code>'static</code></a> lifetime specifies that the reference is valid for the whole program run time (which is true for the VGA text buffer).</p> <h3 id="printing"><a class="zola-anchor" href="#printing" aria-label="Anchor link for: printing">🔗</a>Printing</h3> <p>Now we can use the <code>Writer</code> to modify the buffer’s characters. First we create a method to write a single ASCII byte:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Writer { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>write_byte(</span><span style="color:#569cd6;">&amp;mut </span><span>self, byte: </span><span style="color:#569cd6;">u8</span><span>) { </span><span> </span><span style="color:#569cd6;">match</span><span> byte { </span><span> </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&#39; </span><span style="color:#569cd6;">=&gt; </span><span>self.new_line(), </span><span> byte </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">if </span><span>self.column_position &gt;= </span><span style="color:#b4cea8;">BUFFER_WIDTH </span><span>{ </span><span> self.new_line(); </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> row = </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">1</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> col = self.column_position; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> color_code = self.color_code; </span><span> self.buffer.chars[row][col] = ScreenChar { </span><span> ascii_character: byte, </span><span> color_code, </span><span> }; </span><span> self.column_position += </span><span style="color:#b5cea8;">1</span><span>; </span><span> } </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>new_line(</span><span style="color:#569cd6;">&amp;mut </span><span>self) {</span><span style="color:#608b4e;">/* TODO */</span><span>} </span><span>} </span></code></pre> <p>If the byte is the <a href="https://en.wikipedia.org/wiki/Newline">newline</a> byte <code>\n</code>, the writer does not print anything. Instead, it calls a <code>new_line</code> method, which we’ll implement later. Other bytes get printed to the screen in the second <code>match</code> case.</p> <p>When printing a byte, the writer checks if the current line is full. In that case, a <code>new_line</code> call is used to wrap the line. Then it writes a new <code>ScreenChar</code> to the buffer at the current position. Finally, the current column position is advanced.</p> <p>To print whole strings, we can convert them to bytes and print them one-by-one:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Writer { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>write_string(</span><span style="color:#569cd6;">&amp;mut </span><span>self, s: </span><span style="color:#569cd6;">&amp;str</span><span>) { </span><span> </span><span style="color:#569cd6;">for</span><span> byte </span><span style="color:#569cd6;">in</span><span> s.bytes() { </span><span> </span><span style="color:#569cd6;">match</span><span> byte { </span><span> </span><span style="color:#608b4e;">// printable ASCII byte or newline </span><span> </span><span style="color:#b5cea8;">0x20</span><span style="color:#569cd6;">..</span><span>=</span><span style="color:#b5cea8;">0x7e </span><span style="color:#569cd6;">| b</span><span style="color:#d69d85;">&#39;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&#39; </span><span style="color:#569cd6;">=&gt; </span><span>self.write_byte(byte), </span><span> </span><span style="color:#608b4e;">// not part of printable ASCII range </span><span> </span><span style="color:#569cd6;">_ =&gt; </span><span>self.write_byte(</span><span style="color:#b5cea8;">0xfe</span><span>), </span><span> } </span><span> </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The VGA text buffer only supports ASCII and the additional bytes of <a href="https://en.wikipedia.org/wiki/Code_page_437">code page 437</a>. Rust strings are <a href="https://www.fileformat.info/info/unicode/utf8.htm">UTF-8</a> by default, so they might contain bytes that are not supported by the VGA text buffer. We use a <code>match</code> to differentiate printable ASCII bytes (a newline or anything in between a space character and a <code>~</code> character) and unprintable bytes. For unprintable bytes, we print a <code>■</code> character, which has the hex code <code>0xfe</code> on the VGA hardware.</p> <h4 id="try-it-out"><a class="zola-anchor" href="#try-it-out" aria-label="Anchor link for: try-it-out">🔗</a>Try it out!</h4> <p>To write some characters to the screen, you can create a temporary function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>print_something() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::Yellow, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut</span><span> Buffer) }, </span><span> }; </span><span> </span><span> writer.write_byte(</span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;H&#39;</span><span>); </span><span> writer.write_string(</span><span style="color:#d69d85;">&quot;ello &quot;</span><span>); </span><span> writer.write_string(</span><span style="color:#d69d85;">&quot;Wörld!&quot;</span><span>); </span><span>} </span></code></pre> <p>It first creates a new Writer that points to the VGA buffer at <code>0xb8000</code>. The syntax for this might seem a bit strange: First, we cast the integer <code>0xb8000</code> as a mutable <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#dereferencing-a-raw-pointer">raw pointer</a>. Then we convert it to a mutable reference by dereferencing it (through <code>*</code>) and immediately borrowing it again (through <code>&amp;mut</code>). This conversion requires an <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html"><code>unsafe</code> block</a>, since the compiler can’t guarantee that the raw pointer is valid.</p> <p>Then it writes the byte <code>b'H'</code> to it. The <code>b</code> prefix creates a <a href="https://doc.rust-lang.org/reference/tokens.html#byte-literals">byte literal</a>, which represents an ASCII character. By writing the strings <code>"ello "</code> and <code>"Wörld!"</code>, we test our <code>write_string</code> method and the handling of unprintable characters. To see the output, we need to call the <code>print_something</code> function from our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> vga_buffer::print_something(); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When we run our project now, a <code>Hello W■■rld!</code> should be printed in the <em>lower</em> left corner of the screen in yellow:</p> <p><img src="https://os.phil-opp.com/vga-text-mode/vga-hello.png" alt="QEMU output with a yellow Hello W■■rld! in the lower left corner" /></p> <p>Notice that the <code>ö</code> is printed as two <code>■</code> characters. That’s because <code>ö</code> is represented by two bytes in <a href="https://www.fileformat.info/info/unicode/utf8.htm">UTF-8</a>, which both don’t fall into the printable ASCII range. In fact, this is a fundamental property of UTF-8: the individual bytes of multi-byte values are never valid ASCII.</p> <h3 id="volatile"><a class="zola-anchor" href="#volatile" aria-label="Anchor link for: volatile">🔗</a>Volatile</h3> <p>We just saw that our message was printed correctly. However, it might not work with future Rust compilers that optimize more aggressively.</p> <p>The problem is that we only write to the <code>Buffer</code> and never read from it again. The compiler doesn’t know that we really access VGA buffer memory (instead of normal RAM) and knows nothing about the side effect that some characters appear on the screen. So it might decide that these writes are unnecessary and can be omitted. To avoid this erroneous optimization, we need to specify these writes as <em><a href="https://en.wikipedia.org/wiki/Volatile_(computer_programming)">volatile</a></em>. This tells the compiler that the write has side effects and should not be optimized away.</p> <p>In order to use volatile writes for the VGA buffer, we use the <a href="https://docs.rs/volatile">volatile</a> library. This <em>crate</em> (this is how packages are called in the Rust world) provides a <code>Volatile</code> wrapper type with <code>read</code> and <code>write</code> methods. These methods internally use the <a href="https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html">read_volatile</a> and <a href="https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html">write_volatile</a> functions of the core library and thus guarantee that the reads/writes are not optimized away.</p> <p>We can add a dependency on the <code>volatile</code> crate by adding it to the <code>dependencies</code> section of our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">volatile </span><span>= </span><span style="color:#d69d85;">&quot;0.2.6&quot; </span></code></pre> <p>Make sure to specify <code>volatile</code> version <code>0.2.6</code>. Newer versions of the crate are not compatible with this post. <code>0.2.6</code> is the <a href="https://semver.org/">semantic</a> version number. For more information, see the <a href="https://doc.crates.io/specifying-dependencies.html">Specifying Dependencies</a> guide of the cargo documentation.</p> <p>Let’s use it to make writes to the VGA buffer volatile. We update our <code>Buffer</code> type as follows:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">use </span><span>volatile::Volatile; </span><span> </span><span style="color:#569cd6;">struct </span><span>Buffer { </span><span> chars: [[Volatile&lt;ScreenChar&gt;; BUFFER_WIDTH]; BUFFER_HEIGHT], </span><span>} </span></code></pre> <p>Instead of a <code>ScreenChar</code>, we’re now using a <code>Volatile&lt;ScreenChar&gt;</code>. (The <code>Volatile</code> type is <a href="https://doc.rust-lang.org/book/ch10-01-syntax.html">generic</a> and can wrap (almost) any type). This ensures that we can’t accidentally write to it “normally”. Instead, we have to use the <code>write</code> method now.</p> <p>This means that we have to update our <code>Writer::write_byte</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Writer { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>write_byte(</span><span style="color:#569cd6;">&amp;mut </span><span>self, byte: </span><span style="color:#569cd6;">u8</span><span>) { </span><span> </span><span style="color:#569cd6;">match</span><span> byte { </span><span> </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&#39; </span><span style="color:#569cd6;">=&gt; </span><span>self.new_line(), </span><span> byte </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> self.buffer.chars[row][col].write(ScreenChar { </span><span> ascii_character: byte, </span><span> color_code, </span><span> }); </span><span> </span><span style="color:#569cd6;">... </span><span> } </span><span> } </span><span> } </span><span> </span><span style="color:#569cd6;">... </span><span>} </span></code></pre> <p>Instead of a typical assignment using <code>=</code>, we’re now using the <code>write</code> method. Now we can guarantee that the compiler will never optimize away this write.</p> <h3 id="formatting-macros"><a class="zola-anchor" href="#formatting-macros" aria-label="Anchor link for: formatting-macros">🔗</a>Formatting Macros</h3> <p>It would be nice to support Rust’s formatting macros, too. That way, we can easily print different types, like integers or floats. To support them, we need to implement the <a href="https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html"><code>core::fmt::Write</code></a> trait. The only required method of this trait is <code>write_str</code>, which looks quite similar to our <code>write_string</code> method, just with a <code>fmt::Result</code> return type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt; </span><span> </span><span style="color:#569cd6;">impl </span><span>fmt::Write </span><span style="color:#569cd6;">for </span><span>Writer { </span><span> </span><span style="color:#569cd6;">fn </span><span>write_str(</span><span style="color:#569cd6;">&amp;mut </span><span>self, s: </span><span style="color:#569cd6;">&amp;str</span><span>) -&gt; fmt::Result { </span><span> self.write_string(s); </span><span> Ok(()) </span><span> } </span><span>} </span></code></pre> <p>The <code>Ok(())</code> is just a <code>Ok</code> Result containing the <code>()</code> type.</p> <p>Now we can use Rust’s built-in <code>write!</code>/<code>writeln!</code> formatting macros:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>print_something() { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::Yellow, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut</span><span> Buffer) }, </span><span> }; </span><span> </span><span> writer.write_byte(</span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;H&#39;</span><span>); </span><span> writer.write_string(</span><span style="color:#d69d85;">&quot;ello! &quot;</span><span>); </span><span> write!(writer, </span><span style="color:#d69d85;">&quot;The numbers are </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;"> and </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#b5cea8;">42</span><span>, </span><span style="color:#b5cea8;">1.0</span><span>/</span><span style="color:#b5cea8;">3.0</span><span>).unwrap(); </span><span>} </span></code></pre> <p>Now you should see a <code>Hello! The numbers are 42 and 0.3333333333333333</code> at the bottom of the screen. The <code>write!</code> call returns a <code>Result</code> which causes a warning if not used, so we call the <a href="https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap"><code>unwrap</code></a> function on it, which panics if an error occurs. This isn’t a problem in our case, since writes to the VGA buffer never fail.</p> <h3 id="newlines"><a class="zola-anchor" href="#newlines" aria-label="Anchor link for: newlines">🔗</a>Newlines</h3> <p>Right now, we just ignore newlines and characters that don’t fit into the line anymore. Instead, we want to move every character one line up (the top line gets deleted) and start at the beginning of the last line again. To do this, we add an implementation for the <code>new_line</code> method of <code>Writer</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Writer { </span><span> </span><span style="color:#569cd6;">fn </span><span>new_line(</span><span style="color:#569cd6;">&amp;mut </span><span>self) { </span><span> </span><span style="color:#569cd6;">for</span><span> row </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">1</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>{ </span><span> </span><span style="color:#569cd6;">for</span><span> col </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">BUFFER_WIDTH </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> character = self.buffer.chars[row][col].read(); </span><span> self.buffer.chars[row - </span><span style="color:#b5cea8;">1</span><span>][col].write(character); </span><span> } </span><span> } </span><span> self.clear_row(</span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">1</span><span>); </span><span> self.column_position = </span><span style="color:#b5cea8;">0</span><span>; </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>clear_row(</span><span style="color:#569cd6;">&amp;mut </span><span>self, row: </span><span style="color:#569cd6;">usize</span><span>) {</span><span style="color:#608b4e;">/* TODO */</span><span>} </span><span>} </span></code></pre> <p>We iterate over all the screen characters and move each character one row up. Note that the upper bound of the range notation (<code>..</code>) is exclusive. We also omit the 0th row (the first range starts at <code>1</code>) because it’s the row that is shifted off screen.</p> <p>To finish the newline code, we add the <code>clear_row</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Writer { </span><span> </span><span style="color:#569cd6;">fn </span><span>clear_row(</span><span style="color:#569cd6;">&amp;mut </span><span>self, row: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">let</span><span> blank = ScreenChar { </span><span> ascii_character: </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39; &#39;</span><span>, </span><span> color_code: self.color_code, </span><span> }; </span><span> </span><span style="color:#569cd6;">for</span><span> col </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">BUFFER_WIDTH </span><span>{ </span><span> self.buffer.chars[row][col].write(blank); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>This method clears a row by overwriting all of its characters with a space character.</p> <h2 id="a-global-interface"><a class="zola-anchor" href="#a-global-interface" aria-label="Anchor link for: a-global-interface">🔗</a>A Global Interface</h2> <p>To provide a global writer that can be used as an interface from other modules without carrying a <code>Writer</code> instance around, we try to create a static <code>WRITER</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">pub static </span><span style="color:#b4cea8;">WRITER</span><span>: Writer = Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::Yellow, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut</span><span> Buffer) }, </span><span>}; </span></code></pre> <p>However, if we try to compile it now, the following errors occur:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants </span><span> --&gt; src/vga_buffer.rs:7:17 </span><span> | </span><span>7 | color_code: ColorCode::new(Color::Yellow, Color::Black), </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ </span><span> </span><span>error[E0396]: raw pointers cannot be dereferenced in statics </span><span> --&gt; src/vga_buffer.rs:8:22 </span><span> | </span><span>8 | buffer: unsafe { &amp;mut *(0xb8000 as *mut Buffer) }, </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant </span><span> </span><span>error[E0017]: references in statics may only refer to immutable values </span><span> --&gt; src/vga_buffer.rs:8:22 </span><span> | </span><span>8 | buffer: unsafe { &amp;mut *(0xb8000 as *mut Buffer) }, </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values </span><span> </span><span>error[E0017]: references in statics may only refer to immutable values </span><span> --&gt; src/vga_buffer.rs:8:13 </span><span> | </span><span>8 | buffer: unsafe { &amp;mut *(0xb8000 as *mut Buffer) }, </span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values </span></code></pre> <p>To understand what’s happening here, we need to know that statics are initialized at compile time, in contrast to normal variables that are initialized at run time. The component of the Rust compiler that evaluates such initialization expressions is called the “<a href="https://rustc-dev-guide.rust-lang.org/const-eval.html">const evaluator</a>”. Its functionality is still limited, but there is ongoing work to expand it, for example in the “<a href="https://github.com/rust-lang/rfcs/pull/2345">Allow panicking in constants</a>” RFC.</p> <p>The issue with <code>ColorCode::new</code> would be solvable by using <a href="https://doc.rust-lang.org/reference/const_eval.html#const-functions"><code>const</code> functions</a>, but the fundamental problem here is that Rust’s const evaluator is not able to convert raw pointers to references at compile time. Maybe it will work someday, but until then, we have to find another solution.</p> <h3 id="lazy-statics"><a class="zola-anchor" href="#lazy-statics" aria-label="Anchor link for: lazy-statics">🔗</a>Lazy Statics</h3> <p>The one-time initialization of statics with non-const functions is a common problem in Rust. Fortunately, there already exists a good solution in a crate named <a href="https://docs.rs/lazy_static/1.0.1/lazy_static/">lazy_static</a>. This crate provides a <code>lazy_static!</code> macro that defines a lazily initialized <code>static</code>. Instead of computing its value at compile time, the <code>static</code> lazily initializes itself when accessed for the first time. Thus, the initialization happens at runtime, so arbitrarily complex initialization code is possible.</p> <p>Let’s add the <code>lazy_static</code> crate to our project:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies.lazy_static</span><span>] </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;1.0&quot; </span><span style="color:#569cd6;">features </span><span>= [</span><span style="color:#d69d85;">&quot;spin_no_std&quot;</span><span>] </span></code></pre> <p>We need the <code>spin_no_std</code> feature, since we don’t link the standard library.</p> <p>With <code>lazy_static</code>, we can define our static <code>WRITER</code> without problems:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">use </span><span>lazy_static::lazy_static; </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">pub static ref </span><span style="color:#b4cea8;">WRITER</span><span>: Writer = Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::Yellow, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut</span><span> Buffer) }, </span><span> }; </span><span>} </span></code></pre> <p>However, this <code>WRITER</code> is pretty useless since it is immutable. This means that we can’t write anything to it (since all the write methods take <code>&amp;mut self</code>). One possible solution would be to use a <a href="https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable">mutable static</a>. But then every read and write to it would be unsafe since it could easily introduce data races and other bad things. Using <code>static mut</code> is highly discouraged. There were even proposals to <a href="https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437">remove it</a>. But what are the alternatives? We could try to use an immutable static with a cell type like <a href="https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#keeping-track-of-borrows-at-runtime-with-refcellt">RefCell</a> or even <a href="https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html">UnsafeCell</a> that provides <a href="https://doc.rust-lang.org/book/ch15-05-interior-mutability.html">interior mutability</a>. But these types aren’t <a href="https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html">Sync</a> (with good reason), so we can’t use them in statics.</p> <h3 id="spinlocks"><a class="zola-anchor" href="#spinlocks" aria-label="Anchor link for: spinlocks">🔗</a>Spinlocks</h3> <p>To get synchronized interior mutability, users of the standard library can use <a href="https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html">Mutex</a>. It provides mutual exclusion by blocking threads when the resource is already locked. But our basic kernel does not have any blocking support or even a concept of threads, so we can’t use it either. However, there is a really basic kind of mutex in computer science that requires no operating system features: the <a href="https://en.wikipedia.org/wiki/Spinlock">spinlock</a>. Instead of blocking, the threads simply try to lock it again and again in a tight loop, thus burning CPU time until the mutex is free again.</p> <p>To use a spinning mutex, we can add the <a href="https://crates.io/crates/spin">spin crate</a> as a dependency:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">spin </span><span>= </span><span style="color:#d69d85;">&quot;0.5.2&quot; </span></code></pre> <p>Then we can use the spinning mutex to add safe <a href="https://doc.rust-lang.org/book/ch15-05-interior-mutability.html">interior mutability</a> to our static <code>WRITER</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">use </span><span>spin::Mutex; </span><span style="color:#569cd6;">... </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">pub static ref </span><span style="color:#b4cea8;">WRITER</span><span>: Mutex&lt;Writer&gt; = Mutex::new(Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::Yellow, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut</span><span> Buffer) }, </span><span> }); </span><span>} </span></code></pre> <p>Now we can delete the <code>print_something</code> function and print directly from our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> vga_buffer::</span><span style="color:#b4cea8;">WRITER</span><span>.lock().write_str(</span><span style="color:#d69d85;">&quot;Hello again&quot;</span><span>).unwrap(); </span><span> write!(vga_buffer::WRITER.lock(), </span><span style="color:#d69d85;">&quot;, some numbers: </span><span style="color:#b4cea8;">{} {}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#b5cea8;">42</span><span>, </span><span style="color:#b5cea8;">1.337</span><span>).unwrap(); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We need to import the <code>fmt::Write</code> trait in order to be able to use its functions.</p> <h3 id="safety"><a class="zola-anchor" href="#safety" aria-label="Anchor link for: safety">🔗</a>Safety</h3> <p>Note that we only have a single unsafe block in our code, which is needed to create a <code>Buffer</code> reference pointing to <code>0xb8000</code>. Afterwards, all operations are safe. Rust uses bounds checking for array accesses by default, so we can’t accidentally write outside the buffer. Thus, we encoded the required conditions in the type system and are able to provide a safe interface to the outside.</p> <h3 id="a-println-macro"><a class="zola-anchor" href="#a-println-macro" aria-label="Anchor link for: a-println-macro">🔗</a>A println Macro</h3> <p>Now that we have a global writer, we can add a <code>println</code> macro that can be used from anywhere in the codebase. Rust’s <a href="https://doc.rust-lang.org/nightly/book/ch19-06-macros.html#declarative-macros-with-macro_rules-for-general-metaprogramming">macro syntax</a> is a bit strange, so we won’t try to write a macro from scratch. Instead, we look at the source of the <a href="https://doc.rust-lang.org/nightly/std/macro.println!.html"><code>println!</code> macro</a> in the standard library:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[macro_export] </span><span>macro_rules! println { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>(print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>)); </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>(print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>, format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*))); </span><span>} </span></code></pre> <p>Macros are defined through one or more rules, similar to <code>match</code> arms. The <code>println</code> macro has two rules: The first rule is for invocations without arguments, e.g., <code>println!()</code>, which is expanded to <code>print!("\n")</code> and thus just prints a newline. The second rule is for invocations with parameters such as <code>println!("Hello")</code> or <code>println!("Number: {}", 4)</code>. It is also expanded to an invocation of the <code>print!</code> macro, passing all arguments and an additional newline <code>\n</code> at the end.</p> <p>The <code>#[macro_export]</code> attribute makes the macro available to the whole crate (not just the module it is defined in) and external crates. It also places the macro at the crate root, which means we have to import the macro through <code>use std::println</code> instead of <code>std::macros::println</code>.</p> <p>The <a href="https://doc.rust-lang.org/nightly/std/macro.print!.html"><code>print!</code> macro</a> is defined as:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[macro_export] </span><span>macro_rules! print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::io::_print(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*))); </span><span>} </span></code></pre> <p>The macro expands to a call of the <a href="https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698"><code>_print</code> function</a> in the <code>io</code> module. The <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate"><code>$crate</code> variable</a> ensures that the macro also works from outside the <code>std</code> crate by expanding to <code>std</code> when it’s used in other crates.</p> <p>The <a href="https://doc.rust-lang.org/nightly/std/macro.format_args.html"><code>format_args</code> macro</a> builds a <a href="https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html">fmt::Arguments</a> type from the passed arguments, which is passed to <code>_print</code>. The <a href="https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698"><code>_print</code> function</a> of libstd calls <code>print_to</code>, which is rather complicated because it supports different <code>Stdout</code> devices. We don’t need that complexity since we just want to print to the VGA buffer.</p> <p>To print to the VGA buffer, we just copy the <code>println!</code> and <code>print!</code> macros, but modify them to use our own <code>_print</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>#[macro_export] </span><span>macro_rules! print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::vga_buffer::_print(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*))); </span><span>} </span><span> </span><span>#[macro_export] </span><span>macro_rules! println { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>($crate::print</span><span style="color:#569cd6;">!</span><span>(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>)); </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::print</span><span style="color:#569cd6;">!</span><span>(</span><span style="color:#d69d85;">&quot;{}</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>, format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*))); </span><span>} </span><span> </span><span>#[doc(hidden)] </span><span style="color:#569cd6;">pub fn </span><span>_print(args: fmt::Arguments) { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#b4cea8;">WRITER</span><span>.lock().write_fmt(args).unwrap(); </span><span>} </span></code></pre> <p>One thing that we changed from the original <code>println</code> definition is that we prefixed the invocations of the <code>print!</code> macro with <code>$crate</code> too. This ensures that we don’t need to import the <code>print!</code> macro too if we only want to use <code>println</code>.</p> <p>Like in the standard library, we add the <code>#[macro_export]</code> attribute to both macros to make them available everywhere in our crate. Note that this places the macros in the root namespace of the crate, so importing them via <code>use crate::vga_buffer::println</code> does not work. Instead, we have to do <code>use crate::println</code>.</p> <p>The <code>_print</code> function locks our static <code>WRITER</code> and calls the <code>write_fmt</code> method on it. This method is from the <code>Write</code> trait, which we need to import. The additional <code>unwrap()</code> at the end panics if printing isn’t successful. But since we always return <code>Ok</code> in <code>write_str</code>, that should not happen.</p> <p>Since the macros need to be able to call <code>_print</code> from outside of the module, the function needs to be public. However, since we consider this a private implementation detail, we add the <a href="https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden"><code>doc(hidden)</code> attribute</a> to hide it from the generated documentation.</p> <h3 id="hello-world-using-println"><a class="zola-anchor" href="#hello-world-using-println" aria-label="Anchor link for: hello-world-using-println">🔗</a>Hello World using <code>println</code></h3> <p>Now we can use <code>println</code> in our <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/main.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>Note that we don’t have to import the macro in the main function, because it already lives in the root namespace.</p> <p>As expected, we now see a <em>“Hello World!”</em> on the screen:</p> <p><img src="https://os.phil-opp.com/vga-text-mode/vga-hello-world.png" alt="QEMU printing “Hello World!”" /></p> <h3 id="printing-panic-messages"><a class="zola-anchor" href="#printing-panic-messages" aria-label="Anchor link for: printing-panic-messages">🔗</a>Printing Panic Messages</h3> <p>Now that we have a <code>println</code> macro, we can use it in our panic function to print the panic message and the location of the panic:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in main.rs </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, info); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When we now insert <code>panic!("Some panic message");</code> in our <code>_start</code> function, we get the following output:</p> <p><img src="https://os.phil-opp.com/vga-text-mode/vga-panic.png" alt="QEMU printing “panicked at ‘Some panic message’, src/main.rs:28:5" /></p> <p>So we know not only that a panic has occurred, but also the panic message and where in the code it happened.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>In this post, we learned about the structure of the VGA text buffer and how it can be written through the memory mapping at address <code>0xb8000</code>. We created a Rust module that encapsulates the unsafety of writing to this memory-mapped buffer and presents a safe and convenient interface to the outside.</p> <p>Thanks to cargo, we also saw how easy it is to add dependencies on third-party libraries. The two dependencies that we added, <code>lazy_static</code> and <code>spin</code>, are very useful in OS development and we will use them in more places in future posts.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>The next post explains how to set up Rust’s built-in unit test framework. We will then create some basic unit tests for the VGA buffer module from this post.</p> A Freestanding Rust Binary Sat, 10 Feb 2018 00:00:00 +0000 https://os.phil-opp.com/freestanding-rust-binary/ https://os.phil-opp.com/freestanding-rust-binary/ <p>The first step in creating our own operating system kernel is to create a Rust executable that does not link the standard library. This makes it possible to run Rust code on the <a href="https://en.wikipedia.org/wiki/Bare_machine">bare metal</a> without an underlying operating system.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/freestanding-rust-binary/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-01"><code>post-01</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="introduction"><a class="zola-anchor" href="#introduction" aria-label="Anchor link for: introduction">🔗</a>Introduction</h2> <p>To write an operating system kernel, we need code that does not depend on any operating system features. This means that we can’t use threads, files, heap memory, the network, random numbers, standard output, or any other features requiring OS abstractions or specific hardware. Which makes sense, since we’re trying to write our own OS and our own drivers.</p> <p>This means that we can’t use most of the <a href="https://doc.rust-lang.org/std/">Rust standard library</a>, but there are a lot of Rust features that we <em>can</em> use. For example, we can use <a href="https://doc.rust-lang.org/book/ch13-02-iterators.html">iterators</a>, <a href="https://doc.rust-lang.org/book/ch13-01-closures.html">closures</a>, <a href="https://doc.rust-lang.org/book/ch06-00-enums.html">pattern matching</a>, <a href="https://doc.rust-lang.org/core/option/">option</a> and <a href="https://doc.rust-lang.org/core/result/">result</a>, <a href="https://doc.rust-lang.org/core/macro.write.html">string formatting</a>, and of course the <a href="https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html">ownership system</a>. These features make it possible to write a kernel in a very expressive, high level way without worrying about <a href="https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs">undefined behavior</a> or <a href="https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention">memory safety</a>.</p> <p>In order to create an OS kernel in Rust, we need to create an executable that can be run without an underlying operating system. Such an executable is often called a “freestanding” or “bare-metal” executable.</p> <p>This post describes the necessary steps to create a freestanding Rust binary and explains why the steps are needed. If you’re just interested in a minimal example, you can <strong><a href="https://os.phil-opp.com/freestanding-rust-binary/#summary">jump to the summary</a></strong>.</p> <h2 id="disabling-the-standard-library"><a class="zola-anchor" href="#disabling-the-standard-library" aria-label="Anchor link for: disabling-the-standard-library">🔗</a>Disabling the Standard Library</h2> <p>By default, all Rust crates link the <a href="https://doc.rust-lang.org/std/">standard library</a>, which depends on the operating system for features such as threads, files, or networking. It also depends on the C standard library <code>libc</code>, which closely interacts with OS services. Since our plan is to write an operating system, we can’t use any OS-dependent libraries. So we have to disable the automatic inclusion of the standard library through the <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html"><code>no_std</code> attribute</a>.</p> <p>We start by creating a new cargo application project. The easiest way to do this is through the command line:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo new blog_os --bin --edition 2018 </span></code></pre> <p>I named the project <code>blog_os</code>, but of course you can choose your own name. The <code>--bin</code> flag specifies that we want to create an executable binary (in contrast to a library) and the <code>--edition 2018</code> flag specifies that we want to use the <a href="https://doc.rust-lang.org/nightly/edition-guide/rust-2018/index.html">2018 edition</a> of Rust for our crate. When we run the command, cargo creates the following directory structure for us:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>blog_os </span><span>├── Cargo.toml </span><span>└── src </span><span> └── main.rs </span></code></pre> <p>The <code>Cargo.toml</code> contains the crate configuration, for example the crate name, the author, the <a href="https://semver.org/">semantic version</a> number, and dependencies. The <code>src/main.rs</code> file contains the root module of our crate and our <code>main</code> function. You can compile your crate through <code>cargo build</code> and then run the compiled <code>blog_os</code> binary in the <code>target/debug</code> subfolder.</p> <h3 id="the-no-std-attribute"><a class="zola-anchor" href="#the-no-std-attribute" aria-label="Anchor link for: the-no-std-attribute">🔗</a>The <code>no_std</code> Attribute</h3> <p>Right now our crate implicitly links the standard library. Let’s try to disable this by adding the <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html"><code>no_std</code> attribute</a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// main.rs </span><span> </span><span>#![no_std] </span><span> </span><span style="color:#569cd6;">fn </span><span>main() { </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello, world!&quot;</span><span>); </span><span>} </span></code></pre> <p>When we try to build it now (by running <code>cargo build</code>), the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: cannot find macro `println!` in this scope </span><span> --&gt; src/main.rs:4:5 </span><span> | </span><span>4 | println!(&quot;Hello, world!&quot;); </span><span> | ^^^^^^^ </span></code></pre> <p>The reason for this error is that the <a href="https://doc.rust-lang.org/std/macro.println.html"><code>println</code> macro</a> is part of the standard library, which we no longer include. So we can no longer print things. This makes sense, since <code>println</code> writes to <a href="https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29">standard output</a>, which is a special file descriptor provided by the operating system.</p> <p>So let’s remove the printing and try again with an empty main function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// main.rs </span><span> </span><span>#![no_std] </span><span> </span><span style="color:#569cd6;">fn </span><span>main() {} </span></code></pre> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo build </span><span>error: `#[panic_handler]` function required, but not found </span><span>error: language item required, but not found: `eh_personality` </span></code></pre> <p>Now the compiler is missing a <code>#[panic_handler]</code> function and a <em>language item</em>.</p> <h2 id="panic-implementation"><a class="zola-anchor" href="#panic-implementation" aria-label="Anchor link for: panic-implementation">🔗</a>Panic Implementation</h2> <p>The <code>panic_handler</code> attribute defines the function that the compiler should invoke when a <a href="https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html">panic</a> occurs. The standard library provides its own panic handler function, but in a <code>no_std</code> environment we need to define it ourselves:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in main.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>The <a href="https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html"><code>PanicInfo</code> parameter</a> contains the file and line where the panic happened and the optional panic message. The function should never return, so it is marked as a <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions">diverging function</a> by returning the <a href="https://doc.rust-lang.org/nightly/std/primitive.never.html">“never” type</a> <code>!</code>. There is not much we can do in this function for now, so we just loop indefinitely.</p> <h2 id="the-eh-personality-language-item"><a class="zola-anchor" href="#the-eh-personality-language-item" aria-label="Anchor link for: the-eh-personality-language-item">🔗</a>The <code>eh_personality</code> Language Item</h2> <p>Language items are special functions and types that are required internally by the compiler. For example, the <a href="https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html"><code>Copy</code></a> trait is a language item that tells the compiler which types have <a href="https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html"><em>copy semantics</em></a>. When we look at the <a href="https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299">implementation</a>, we see it has the special <code>#[lang = "copy"]</code> attribute that defines it as a language item.</p> <p>While providing custom implementations of language items is possible, it should only be done as a last resort. The reason is that language items are highly unstable implementation details and not even type checked (so the compiler doesn’t even check if a function has the right argument types). Fortunately, there is a more stable way to fix the above language item error.</p> <p>The <a href="https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45"><code>eh_personality</code> language item</a> marks a function that is used for implementing <a href="https://www.bogotobogo.com/cplusplus/stackunwinding.php">stack unwinding</a>. By default, Rust uses unwinding to run the destructors of all live stack variables in case of a <a href="https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html">panic</a>. This ensures that all used memory is freed and allows the parent thread to catch the panic and continue execution. Unwinding, however, is a complicated process and requires some OS-specific libraries (e.g. <a href="https://www.nongnu.org/libunwind/">libunwind</a> on Linux or <a href="https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling">structured exception handling</a> on Windows), so we don’t want to use it for our operating system.</p> <h3 id="disabling-unwinding"><a class="zola-anchor" href="#disabling-unwinding" aria-label="Anchor link for: disabling-unwinding">🔗</a>Disabling Unwinding</h3> <p>There are other use cases as well for which unwinding is undesirable, so Rust provides an option to <a href="https://github.com/rust-lang/rust/pull/32900">abort on panic</a> instead. This disables the generation of unwinding symbol information and thus considerably reduces binary size. There are multiple places where we can disable unwinding. The easiest way is to add the following lines to our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">profile.dev</span><span>] </span><span style="color:#569cd6;">panic </span><span>= </span><span style="color:#d69d85;">&quot;abort&quot; </span><span> </span><span>[</span><span style="color:#808080;">profile.release</span><span>] </span><span style="color:#569cd6;">panic </span><span>= </span><span style="color:#d69d85;">&quot;abort&quot; </span></code></pre> <p>This sets the panic strategy to <code>abort</code> for both the <code>dev</code> profile (used for <code>cargo build</code>) and the <code>release</code> profile (used for <code>cargo build --release</code>). Now the <code>eh_personality</code> language item should no longer be required.</p> <p>Now we fixed both of the above errors. However, if we try to compile it now, another error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo build </span><span>error: requires `start` lang_item </span></code></pre> <p>Our program is missing the <code>start</code> language item, which defines the entry point.</p> <h2 id="the-start-attribute"><a class="zola-anchor" href="#the-start-attribute" aria-label="Anchor link for: the-start-attribute">🔗</a>The <code>start</code> attribute</h2> <p>One might think that the <code>main</code> function is the first function called when you run a program. However, most languages have a <a href="https://en.wikipedia.org/wiki/Runtime_system">runtime system</a>, which is responsible for things such as garbage collection (e.g. in Java) or software threads (e.g. goroutines in Go). This runtime needs to be called before <code>main</code>, since it needs to initialize itself.</p> <p>In a typical Rust binary that links the standard library, execution starts in a C runtime library called <code>crt0</code> (“C runtime zero”), which sets up the environment for a C application. This includes creating a stack and placing the arguments in the right registers. The C runtime then invokes the <a href="https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73">entry point of the Rust runtime</a>, which is marked by the <code>start</code> language item. Rust only has a very minimal runtime, which takes care of some small things such as setting up stack overflow guards or printing a backtrace on panic. The runtime then finally calls the <code>main</code> function.</p> <p>Our freestanding executable does not have access to the Rust runtime and <code>crt0</code>, so we need to define our own entry point. Implementing the <code>start</code> language item wouldn’t help, since it would still require <code>crt0</code>. Instead, we need to overwrite the <code>crt0</code> entry point directly.</p> <h3 id="overwriting-the-entry-point"><a class="zola-anchor" href="#overwriting-the-entry-point" aria-label="Anchor link for: overwriting-the-entry-point">🔗</a>Overwriting the Entry Point</h3> <p>To tell the Rust compiler that we don’t want to use the normal entry point chain, we add the <code>#![no_main]</code> attribute.</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#![no_std] </span><span>#![no_main] </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>You might notice that we removed the <code>main</code> function. The reason is that a <code>main</code> doesn’t make sense without an underlying runtime that calls it. Instead, we are now overwriting the operating system entry point with our own <code>_start</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>By using the <code>#[no_mangle]</code> attribute, we disable <a href="https://en.wikipedia.org/wiki/Name_mangling">name mangling</a> to ensure that the Rust compiler really outputs a function with the name <code>_start</code>. Without the attribute, the compiler would generate some cryptic <code>_ZN3blog_os4_start7hb173fedf945531caE</code> symbol to give every function a unique name. The attribute is required because we need to tell the name of the entry point function to the linker in the next step.</p> <p>We also have to mark the function as <code>extern "C"</code> to tell the compiler that it should use the <a href="https://en.wikipedia.org/wiki/Calling_convention">C calling convention</a> for this function (instead of the unspecified Rust calling convention). The reason for naming the function <code>_start</code> is that this is the default entry point name for most systems.</p> <p>The <code>!</code> return type means that the function is diverging, i.e. not allowed to ever return. This is required because the entry point is not called by any function, but invoked directly by the operating system or bootloader. So instead of returning, the entry point should e.g. invoke the <a href="https://en.wikipedia.org/wiki/Exit_(system_call)"><code>exit</code> system call</a> of the operating system. In our case, shutting down the machine could be a reasonable action, since there’s nothing left to do if a freestanding binary returns. For now, we fulfill the requirement by looping endlessly.</p> <p>When we run <code>cargo build</code> now, we get an ugly <em>linker</em> error.</p> <h2 id="linker-errors"><a class="zola-anchor" href="#linker-errors" aria-label="Anchor link for: linker-errors">🔗</a>Linker Errors</h2> <p>The linker is a program that combines the generated code into an executable. Since the executable format differs between Linux, Windows, and macOS, each system has its own linker that throws a different error. The fundamental cause of the errors is the same: the default configuration of the linker assumes that our program depends on the C runtime, which it does not.</p> <p>To solve the errors, we need to tell the linker that it should not include the C runtime. We can do this either by passing a certain set of arguments to the linker or by building for a bare metal target.</p> <h3 id="building-for-a-bare-metal-target"><a class="zola-anchor" href="#building-for-a-bare-metal-target" aria-label="Anchor link for: building-for-a-bare-metal-target">🔗</a>Building for a Bare Metal Target</h3> <p>By default Rust tries to build an executable that is able to run in your current system environment. For example, if you’re using Windows on <code>x86_64</code>, Rust tries to build an <code>.exe</code> Windows executable that uses <code>x86_64</code> instructions. This environment is called your “host” system.</p> <p>To describe different environments, Rust uses a string called <a href="https://clang.llvm.org/docs/CrossCompilation.html#target-triple"><em>target triple</em></a>. You can see the target triple for your host system by running <code>rustc --version --verbose</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>rustc 1.35.0-nightly (474e7a648 2019-04-07) </span><span>binary: rustc </span><span>commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab </span><span>commit-date: 2019-04-07 </span><span>host: x86_64-unknown-linux-gnu </span><span>release: 1.35.0-nightly </span><span>LLVM version: 8.0 </span></code></pre> <p>The above output is from a <code>x86_64</code> Linux system. We see that the <code>host</code> triple is <code>x86_64-unknown-linux-gnu</code>, which includes the CPU architecture (<code>x86_64</code>), the vendor (<code>unknown</code>), the operating system (<code>linux</code>), and the <a href="https://en.wikipedia.org/wiki/Application_binary_interface">ABI</a> (<code>gnu</code>).</p> <p>By compiling for our host triple, the Rust compiler and the linker assume that there is an underlying operating system such as Linux or Windows that uses the C runtime by default, which causes the linker errors. So, to avoid the linker errors, we can compile for a different environment with no underlying operating system.</p> <p>An example of such a bare metal environment is the <code>thumbv7em-none-eabihf</code> target triple, which describes an <a href="https://en.wikipedia.org/wiki/Embedded_system">embedded</a> <a href="https://en.wikipedia.org/wiki/ARM_architecture">ARM</a> system. The details are not important, all that matters is that the target triple has no underlying operating system, which is indicated by the <code>none</code> in the target triple. To be able to compile for this target, we need to add it in rustup:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>rustup target add thumbv7em-none-eabihf </span></code></pre> <p>This downloads a copy of the standard (and core) library for the system. Now we can build our freestanding executable for this target:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo build --target thumbv7em-none-eabihf </span></code></pre> <p>By passing a <code>--target</code> argument we <a href="https://en.wikipedia.org/wiki/Cross_compiler">cross compile</a> our executable for a bare metal target system. Since the target system has no operating system, the linker does not try to link the C runtime and our build succeeds without any linker errors.</p> <p>This is the approach that we will use for building our OS kernel. Instead of <code>thumbv7em-none-eabihf</code>, we will use a <a href="https://doc.rust-lang.org/rustc/targets/custom.html">custom target</a> that describes a <code>x86_64</code> bare metal environment. The details will be explained in the next post.</p> <h3 id="linker-arguments"><a class="zola-anchor" href="#linker-arguments" aria-label="Anchor link for: linker-arguments">🔗</a>Linker Arguments</h3> <p>Instead of compiling for a bare metal system, it is also possible to resolve the linker errors by passing a certain set of arguments to the linker. This isn’t the approach that we will use for our kernel, therefore this section is optional and only provided for completeness. Click on <em>“Linker Arguments”</em> below to show the optional content.</p> <details> <summary>Linker Arguments</summary> <p>In this section we discuss the linker errors that occur on Linux, Windows, and macOS, and explain how to solve them by passing additional arguments to the linker. Note that the executable format and the linker differ between operating systems, so that a different set of arguments is required for each system.</p> <h4 id="linux"><a class="zola-anchor" href="#linux" aria-label="Anchor link for: linux">🔗</a>Linux</h4> <p>On Linux the following linker error occurs (shortened):</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: linking with `cc` failed: exit code: 1 </span><span> | </span><span> = note: &quot;cc&quot; […] </span><span> = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start&#39;: </span><span> (.text+0x12): undefined reference to `__libc_csu_fini&#39; </span><span> /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start&#39;: </span><span> (.text+0x19): undefined reference to `__libc_csu_init&#39; </span><span> /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start&#39;: </span><span> (.text+0x25): undefined reference to `__libc_start_main&#39; </span><span> collect2: error: ld returned 1 exit status </span></code></pre> <p>The problem is that the linker includes the startup routine of the C runtime by default, which is also called <code>_start</code>. It requires some symbols of the C standard library <code>libc</code> that we don’t include due to the <code>no_std</code> attribute, therefore the linker can’t resolve these references. To solve this, we can tell the linker that it should not link the C startup routine by passing the <code>-nostartfiles</code> flag.</p> <p>One way to pass linker attributes via cargo is the <code>cargo rustc</code> command. The command behaves exactly like <code>cargo build</code>, but allows to pass options to <code>rustc</code>, the underlying Rust compiler. <code>rustc</code> has the <code>-C link-arg</code> flag, which passes an argument to the linker. Combined, our new build command looks like this:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo rustc -- -C link-arg=-nostartfiles </span></code></pre> <p>Now our crate builds as a freestanding executable on Linux!</p> <p>We didn’t need to specify the name of our entry point function explicitly since the linker looks for a function with the name <code>_start</code> by default.</p> <h4 id="windows"><a class="zola-anchor" href="#windows" aria-label="Anchor link for: windows">🔗</a>Windows</h4> <p>On Windows, a different linker error occurs (shortened):</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: linking with `link.exe` failed: exit code: 1561 </span><span> | </span><span> = note: &quot;C:\\Program Files (x86)\\…\\link.exe&quot; […] </span><span> = note: LINK : fatal error LNK1561: entry point must be defined </span></code></pre> <p>The “entry point must be defined” error means that the linker can’t find the entry point. On Windows, the default entry point name <a href="https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol">depends on the used subsystem</a>. For the <code>CONSOLE</code> subsystem, the linker looks for a function named <code>mainCRTStartup</code> and for the <code>WINDOWS</code> subsystem, it looks for a function named <code>WinMainCRTStartup</code>. To override the default and tell the linker to look for our <code>_start</code> function instead, we can pass an <code>/ENTRY</code> argument to the linker:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo rustc -- -C link-arg=/ENTRY:_start </span></code></pre> <p>From the different argument format we clearly see that the Windows linker is a completely different program than the Linux linker.</p> <p>Now a different linker error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: linking with `link.exe` failed: exit code: 1221 </span><span> | </span><span> = note: &quot;C:\\Program Files (x86)\\…\\link.exe&quot; […] </span><span> = note: LINK : fatal error LNK1221: a subsystem can&#39;t be inferred and must be </span><span> defined </span></code></pre> <p>This error occurs because Windows executables can use different <a href="https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol">subsystems</a>. For normal programs, they are inferred depending on the entry point name: If the entry point is named <code>main</code>, the <code>CONSOLE</code> subsystem is used, and if the entry point is named <code>WinMain</code>, the <code>WINDOWS</code> subsystem is used. Since our <code>_start</code> function has a different name, we need to specify the subsystem explicitly:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo rustc -- -C link-args=&quot;/ENTRY:_start /SUBSYSTEM:console&quot; </span></code></pre> <p>We use the <code>CONSOLE</code> subsystem here, but the <code>WINDOWS</code> subsystem would work too. Instead of passing <code>-C link-arg</code> multiple times, we use <code>-C link-args</code> which takes a space separated list of arguments.</p> <p>With this command, our executable should build successfully on Windows.</p> <h4 id="macos"><a class="zola-anchor" href="#macos" aria-label="Anchor link for: macos">🔗</a>macOS</h4> <p>On macOS, the following linker error occurs (shortened):</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: linking with `cc` failed: exit code: 1 </span><span> | </span><span> = note: &quot;cc&quot; […] </span><span> = note: ld: entry point (_main) undefined. for architecture x86_64 </span><span> clang: error: linker command failed with exit code 1 […] </span></code></pre> <p>This error message tells us that the linker can’t find an entry point function with the default name <code>main</code> (for some reason, all functions are prefixed with a <code>_</code> on macOS). To set the entry point to our <code>_start</code> function, we pass the <code>-e</code> linker argument:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo rustc -- -C link-args=&quot;-e __start&quot; </span></code></pre> <p>The <code>-e</code> flag specifies the name of the entry point function. Since all functions have an additional <code>_</code> prefix on macOS, we need to set the entry point to <code>__start</code> instead of <code>_start</code>.</p> <p>Now the following linker error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: linking with `cc` failed: exit code: 1 </span><span> | </span><span> = note: &quot;cc&quot; […] </span><span> = note: ld: dynamic main executables must link with libSystem.dylib </span><span> for architecture x86_64 </span><span> clang: error: linker command failed with exit code 1 […] </span></code></pre> <p>macOS <a href="https://developer.apple.com/library/archive/qa/qa1118/_index.html">does not officially support statically linked binaries</a> and requires programs to link the <code>libSystem</code> library by default. To override this and link a static binary, we pass the <code>-static</code> flag to the linker:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo rustc -- -C link-args=&quot;-e __start -static&quot; </span></code></pre> <p>This still does not suffice, as a third linker error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: linking with `cc` failed: exit code: 1 </span><span> | </span><span> = note: &quot;cc&quot; […] </span><span> = note: ld: library not found for -lcrt0.o </span><span> clang: error: linker command failed with exit code 1 […] </span></code></pre> <p>This error occurs because programs on macOS link to <code>crt0</code> (“C runtime zero”) by default. This is similar to the error we had on Linux and can also be solved by adding the <code>-nostartfiles</code> linker argument:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo rustc -- -C link-args=&quot;-e __start -static -nostartfiles&quot; </span></code></pre> <p>Now our program should build successfully on macOS.</p> <h4 id="unifying-the-build-commands"><a class="zola-anchor" href="#unifying-the-build-commands" aria-label="Anchor link for: unifying-the-build-commands">🔗</a>Unifying the Build Commands</h4> <p>Right now we have different build commands depending on the host platform, which is not ideal. To avoid this, we can create a file named <code>.cargo/config.toml</code> that contains the platform-specific arguments:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in .cargo/config.toml </span><span> </span><span>[</span><span style="color:#808080;">target.</span><span style="color:#d69d85;">&#39;cfg(target_os = &quot;linux&quot;)&#39;</span><span>] </span><span style="color:#569cd6;">rustflags </span><span>= [</span><span style="color:#d69d85;">&quot;-C&quot;</span><span>, </span><span style="color:#d69d85;">&quot;link-arg=-nostartfiles&quot;</span><span>] </span><span> </span><span>[</span><span style="color:#808080;">target.</span><span style="color:#d69d85;">&#39;cfg(target_os = &quot;windows&quot;)&#39;</span><span>] </span><span style="color:#569cd6;">rustflags </span><span>= [</span><span style="color:#d69d85;">&quot;-C&quot;</span><span>, </span><span style="color:#d69d85;">&quot;link-args=/ENTRY:_start /SUBSYSTEM:console&quot;</span><span>] </span><span> </span><span>[</span><span style="color:#808080;">target.</span><span style="color:#d69d85;">&#39;cfg(target_os = &quot;macos&quot;)&#39;</span><span>] </span><span style="color:#569cd6;">rustflags </span><span>= [</span><span style="color:#d69d85;">&quot;-C&quot;</span><span>, </span><span style="color:#d69d85;">&quot;link-args=-e __start -static -nostartfiles&quot;</span><span>] </span></code></pre> <p>The <code>rustflags</code> key contains arguments that are automatically added to every invocation of <code>rustc</code>. For more information on the <code>.cargo/config.toml</code> file, check out the <a href="https://doc.rust-lang.org/cargo/reference/config.html">official documentation</a>.</p> <p>Now our program should be buildable on all three platforms with a simple <code>cargo build</code>.</p> <h4 id="should-you-do-this"><a class="zola-anchor" href="#should-you-do-this" aria-label="Anchor link for: should-you-do-this">🔗</a>Should You Do This?</h4> <p>While it’s possible to build a freestanding executable for Linux, Windows, and macOS, it’s probably not a good idea. The reason is that our executable still expects various things, for example that a stack is initialized when the <code>_start</code> function is called. Without the C runtime, some of these requirements might not be fulfilled, which might cause our program to fail, e.g. through a segmentation fault.</p> <p>If you want to create a minimal binary that runs on top of an existing operating system, including <code>libc</code> and setting the <code>#[start]</code> attribute as described <a href="https://doc.rust-lang.org/1.16.0/book/no-stdlib.html">here</a> is probably a better idea.</p> </details> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>A minimal freestanding Rust binary looks like this:</p> <p><code>src/main.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#![no_std] </span><span style="color:#608b4e;">// don&#39;t link the Rust standard library </span><span>#![no_main] </span><span style="color:#608b4e;">// disable all Rust-level entry points </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span>#[no_mangle] </span><span style="color:#608b4e;">// don&#39;t mangle the name of this function </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// this function is the entry point, since the linker looks for a function </span><span> </span><span style="color:#608b4e;">// named `_start` by default </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p><code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">package</span><span>] </span><span style="color:#569cd6;">name </span><span>= </span><span style="color:#d69d85;">&quot;crate_name&quot; </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.1.0&quot; </span><span style="color:#569cd6;">authors </span><span>= [</span><span style="color:#d69d85;">&quot;Author Name &lt;[email protected]&gt;&quot;</span><span>] </span><span> </span><span style="color:#608b4e;"># the profile used for `cargo build` </span><span>[</span><span style="color:#808080;">profile.dev</span><span>] </span><span style="color:#569cd6;">panic </span><span>= </span><span style="color:#d69d85;">&quot;abort&quot; </span><span style="color:#608b4e;"># disable stack unwinding on panic </span><span> </span><span style="color:#608b4e;"># the profile used for `cargo build --release` </span><span>[</span><span style="color:#808080;">profile.release</span><span>] </span><span style="color:#569cd6;">panic </span><span>= </span><span style="color:#d69d85;">&quot;abort&quot; </span><span style="color:#608b4e;"># disable stack unwinding on panic </span></code></pre> <p>To build this binary, we need to compile for a bare metal target such as <code>thumbv7em-none-eabihf</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo build --target thumbv7em-none-eabihf </span></code></pre> <p>Alternatively, we can compile it for the host system by passing additional linker arguments:</p> <pre data-lang="bash" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#608b4e;"># Linux </span><span>cargo rustc</span><span style="color:#569cd6;"> --</span><span> -C link-arg=-nostartfiles </span><span style="color:#608b4e;"># Windows </span><span>cargo rustc</span><span style="color:#569cd6;"> --</span><span> -C link-args=</span><span style="color:#d69d85;">&quot;/ENTRY:_start /SUBSYSTEM:console&quot; </span><span style="color:#608b4e;"># macOS </span><span>cargo rustc</span><span style="color:#569cd6;"> --</span><span> -C link-args=</span><span style="color:#d69d85;">&quot;-e __start -static -nostartfiles&quot; </span></code></pre> <p>Note that this is just a minimal example of a freestanding Rust binary. This binary expects various things, for example, that a stack is initialized when the <code>_start</code> function is called. <strong>So for any real use of such a binary, more steps are required</strong>.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>The <a href="https://os.phil-opp.com/minimal-rust-kernel/">next post</a> explains the steps needed for turning our freestanding binary into a minimal operating system kernel. This includes creating a custom target, combining our executable with a bootloader, and learning how to print something to the screen.</p> A Minimal Rust Kernel Sat, 10 Feb 2018 00:00:00 +0000 https://os.phil-opp.com/minimal-rust-kernel/ https://os.phil-opp.com/minimal-rust-kernel/ <p>In this post, we create a minimal 64-bit Rust kernel for the x86 architecture. We build upon the <a href="https://os.phil-opp.com/freestanding-rust-binary/">freestanding Rust binary</a> from the previous post to create a bootable disk image that prints something to the screen.</p> <span id="continue-reading"></span> <p>This blog is openly developed on <a href="https://github.com/phil-opp/blog_os">GitHub</a>. If you have any problems or questions, please open an issue there. You can also leave comments <a href="https://os.phil-opp.com/minimal-rust-kernel/#comments">at the bottom</a>. The complete source code for this post can be found in the <a href="https://github.com/phil-opp/blog_os/tree/post-02"><code>post-02</code></a> branch.</p> <!-- fix for zola anchor checker (target is in template): <a id="comments"> --> <!-- toc --> <h2 id="the-boot-process"><a class="zola-anchor" href="#the-boot-process" aria-label="Anchor link for: the-boot-process">🔗</a>The Boot Process</h2> <p>When you turn on a computer, it begins executing firmware code that is stored in motherboard <a href="https://en.wikipedia.org/wiki/Read-only_memory">ROM</a>. This code performs a <a href="https://en.wikipedia.org/wiki/Power-on_self-test">power-on self-test</a>, detects available RAM, and pre-initializes the CPU and hardware. Afterwards, it looks for a bootable disk and starts booting the operating system kernel.</p> <p>On x86, there are two firmware standards: the “Basic Input/Output System“ (<strong><a href="https://en.wikipedia.org/wiki/BIOS">BIOS</a></strong>) and the newer “Unified Extensible Firmware Interface” (<strong><a href="https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface">UEFI</a></strong>). The BIOS standard is old and outdated, but simple and well-supported on any x86 machine since the 1980s. UEFI, in contrast, is more modern and has much more features, but is more complex to set up (at least in my opinion).</p> <p>Currently, we only provide BIOS support, but support for UEFI is planned, too. If you’d like to help us with this, check out the <a href="https://github.com/phil-opp/blog_os/issues/349">Github issue</a>.</p> <h3 id="bios-boot"><a class="zola-anchor" href="#bios-boot" aria-label="Anchor link for: bios-boot">🔗</a>BIOS Boot</h3> <p>Almost all x86 systems have support for BIOS booting, including newer UEFI-based machines that use an emulated BIOS. This is great, because you can use the same boot logic across all machines from the last century. But this wide compatibility is at the same time the biggest disadvantage of BIOS booting, because it means that the CPU is put into a 16-bit compatibility mode called <a href="https://en.wikipedia.org/wiki/Real_mode">real mode</a> before booting so that archaic bootloaders from the 1980s would still work.</p> <p>But let’s start from the beginning:</p> <p>When you turn on a computer, it loads the BIOS from some special flash memory located on the motherboard. The BIOS runs self-test and initialization routines of the hardware, then it looks for bootable disks. If it finds one, control is transferred to its <em>bootloader</em>, which is a 512-byte portion of executable code stored at the disk’s beginning. Most bootloaders are larger than 512 bytes, so bootloaders are commonly split into a small first stage, which fits into 512 bytes, and a second stage, which is subsequently loaded by the first stage.</p> <p>The bootloader has to determine the location of the kernel image on the disk and load it into memory. It also needs to switch the CPU from the 16-bit <a href="https://en.wikipedia.org/wiki/Real_mode">real mode</a> first to the 32-bit <a href="https://en.wikipedia.org/wiki/Protected_mode">protected mode</a>, and then to the 64-bit <a href="https://en.wikipedia.org/wiki/Long_mode">long mode</a>, where 64-bit registers and the complete main memory are available. Its third job is to query certain information (such as a memory map) from the BIOS and pass it to the OS kernel.</p> <p>Writing a bootloader is a bit cumbersome as it requires assembly language and a lot of non insightful steps like “write this magic value to this processor register”. Therefore, we don’t cover bootloader creation in this post and instead provide a tool named <a href="https://github.com/rust-osdev/bootimage">bootimage</a> that automatically prepends a bootloader to your kernel.</p> <p>If you are interested in building your own bootloader: Stay tuned, a set of posts on this topic is already planned! <!-- , check out our “_[Writing a Bootloader]_” posts, where we explain in detail how a bootloader is built. --></p> <h4 id="the-multiboot-standard"><a class="zola-anchor" href="#the-multiboot-standard" aria-label="Anchor link for: the-multiboot-standard">🔗</a>The Multiboot Standard</h4> <p>To avoid that every operating system implements its own bootloader, which is only compatible with a single OS, the <a href="https://en.wikipedia.org/wiki/Free_Software_Foundation">Free Software Foundation</a> created an open bootloader standard called <a href="https://wiki.osdev.org/Multiboot">Multiboot</a> in 1995. The standard defines an interface between the bootloader and the operating system, so that any Multiboot-compliant bootloader can load any Multiboot-compliant operating system. The reference implementation is <a href="https://en.wikipedia.org/wiki/GNU_GRUB">GNU GRUB</a>, which is the most popular bootloader for Linux systems.</p> <p>To make a kernel Multiboot compliant, one just needs to insert a so-called <a href="https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format">Multiboot header</a> at the beginning of the kernel file. This makes it very easy to boot an OS from GRUB. However, GRUB and the Multiboot standard have some problems too:</p> <ul> <li>They support only the 32-bit protected mode. This means that you still have to do the CPU configuration to switch to the 64-bit long mode.</li> <li>They are designed to make the bootloader simple instead of the kernel. For example, the kernel needs to be linked with an <a href="https://wiki.osdev.org/Multiboot#Multiboot_2">adjusted default page size</a>, because GRUB can’t find the Multiboot header otherwise. Another example is that the <a href="https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format">boot information</a>, which is passed to the kernel, contains lots of architecture-dependent structures instead of providing clean abstractions.</li> <li>Both GRUB and the Multiboot standard are only sparsely documented.</li> <li>GRUB needs to be installed on the host system to create a bootable disk image from the kernel file. This makes development on Windows or Mac more difficult.</li> </ul> <p>Because of these drawbacks, we decided to not use GRUB or the Multiboot standard. However, we plan to add Multiboot support to our <a href="https://github.com/rust-osdev/bootimage">bootimage</a> tool, so that it’s possible to load your kernel on a GRUB system too. If you’re interested in writing a Multiboot compliant kernel, check out the <a href="https://os.phil-opp.com/edition-1/">first edition</a> of this blog series.</p> <h3 id="uefi"><a class="zola-anchor" href="#uefi" aria-label="Anchor link for: uefi">🔗</a>UEFI</h3> <p>(We don’t provide UEFI support at the moment, but we would love to! If you’d like to help, please tell us in the <a href="https://github.com/phil-opp/blog_os/issues/349">Github issue</a>.)</p> <h2 id="a-minimal-kernel"><a class="zola-anchor" href="#a-minimal-kernel" aria-label="Anchor link for: a-minimal-kernel">🔗</a>A Minimal Kernel</h2> <p>Now that we roughly know how a computer boots, it’s time to create our own minimal kernel. Our goal is to create a disk image that prints a “Hello World!” to the screen when booted. We do this by extending the previous post’s <a href="https://os.phil-opp.com/freestanding-rust-binary/">freestanding Rust binary</a>.</p> <p>As you may remember, we built the freestanding binary through <code>cargo</code>, but depending on the operating system, we needed different entry point names and compile flags. That’s because <code>cargo</code> builds for the <em>host system</em> by default, i.e., the system you’re running on. This isn’t something we want for our kernel, because a kernel that runs on top of, e.g., Windows, does not make much sense. Instead, we want to compile for a clearly defined <em>target system</em>.</p> <h3 id="installing-rust-nightly"><a class="zola-anchor" href="#installing-rust-nightly" aria-label="Anchor link for: installing-rust-nightly">🔗</a>Installing Rust Nightly</h3> <p>Rust has three release channels: <em>stable</em>, <em>beta</em>, and <em>nightly</em>. The Rust Book explains the difference between these channels really well, so take a minute and <a href="https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains">check it out</a>. For building an operating system, we will need some experimental features that are only available on the nightly channel, so we need to install a nightly version of Rust.</p> <p>To manage Rust installations, I highly recommend <a href="https://www.rustup.rs/">rustup</a>. It allows you to install nightly, beta, and stable compilers side-by-side and makes it easy to update them. With rustup, you can use a nightly compiler for the current directory by running <code>rustup override set nightly</code>. Alternatively, you can add a file called <code>rust-toolchain</code> with the content <code>nightly</code> to the project’s root directory. You can check that you have a nightly version installed by running <code>rustc --version</code>: The version number should contain <code>-nightly</code> at the end.</p> <p>The nightly compiler allows us to opt-in to various experimental features by using so-called <em>feature flags</em> at the top of our file. For example, we could enable the experimental <a href="https://doc.rust-lang.org/stable/reference/inline-assembly.html"><code>asm!</code> macro</a> for inline assembly by adding <code>#![feature(asm)]</code> to the top of our <code>main.rs</code>. Note that such experimental features are completely unstable, which means that future Rust versions might change or remove them without prior warning. For this reason, we will only use them if absolutely necessary.</p> <h3 id="target-specification"><a class="zola-anchor" href="#target-specification" aria-label="Anchor link for: target-specification">🔗</a>Target Specification</h3> <p>Cargo supports different target systems through the <code>--target</code> parameter. The target is described by a so-called <em><a href="https://clang.llvm.org/docs/CrossCompilation.html#target-triple">target triple</a></em>, which describes the CPU architecture, the vendor, the operating system, and the <a href="https://stackoverflow.com/a/2456882">ABI</a>. For example, the <code>x86_64-unknown-linux-gnu</code> target triple describes a system with an <code>x86_64</code> CPU, no clear vendor, and a Linux operating system with the GNU ABI. Rust supports <a href="https://forge.rust-lang.org/release/platform-support.html">many different target triples</a>, including <code>arm-linux-androideabi</code> for Android or <a href="https://www.hellorust.com/setup/wasm-target/"><code>wasm32-unknown-unknown</code> for WebAssembly</a>.</p> <p>For our target system, however, we require some special configuration parameters (e.g. no underlying OS), so none of the <a href="https://forge.rust-lang.org/release/platform-support.html">existing target triples</a> fits. Fortunately, Rust allows us to define <a href="https://doc.rust-lang.org/nightly/rustc/targets/custom.html">our own target</a> through a JSON file. For example, a JSON file that describes the <code>x86_64-unknown-linux-gnu</code> target looks like this:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-linux-gnu&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;data-layout&quot;</span><span>: </span><span style="color:#d69d85;">&quot;e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;arch&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-endian&quot;</span><span>: </span><span style="color:#d69d85;">&quot;little&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-pointer-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-c-int-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;32&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;os&quot;</span><span>: </span><span style="color:#d69d85;">&quot;linux&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;executables&quot;</span><span>: </span><span style="color:#569cd6;">true</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;linker-flavor&quot;</span><span>: </span><span style="color:#d69d85;">&quot;gcc&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;pre-link-args&quot;</span><span>: [</span><span style="color:#d69d85;">&quot;-m64&quot;</span><span>], </span><span> </span><span style="color:#d69d85;">&quot;morestack&quot;</span><span>: </span><span style="color:#569cd6;">false </span><span>} </span></code></pre> <p>Most fields are required by LLVM to generate code for that platform. For example, the <a href="https://llvm.org/docs/LangRef.html#data-layout"><code>data-layout</code></a> field defines the size of various integer, floating point, and pointer types. Then there are fields that Rust uses for conditional compilation, such as <code>target-pointer-width</code>. The third kind of field defines how the crate should be built. For example, the <code>pre-link-args</code> field specifies arguments passed to the <a href="https://en.wikipedia.org/wiki/Linker_(computing)">linker</a>.</p> <p>We also target <code>x86_64</code> systems with our kernel, so our target specification will look very similar to the one above. Let’s start by creating an <code>x86_64-blog_os.json</code> file (choose any name you like) with the common content:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-none&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;data-layout&quot;</span><span>: </span><span style="color:#d69d85;">&quot;e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;arch&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-endian&quot;</span><span>: </span><span style="color:#d69d85;">&quot;little&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-pointer-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-c-int-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;32&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;os&quot;</span><span>: </span><span style="color:#d69d85;">&quot;none&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;executables&quot;</span><span>: </span><span style="color:#569cd6;">true </span><span>} </span></code></pre> <p>Note that we changed the OS in the <code>llvm-target</code> and the <code>os</code> field to <code>none</code>, because we will run on bare metal.</p> <p>We add the following build-related entries:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span style="color:#d69d85;">&quot;linker-flavor&quot;</span><span>: </span><span style="color:#d69d85;">&quot;ld.lld&quot;</span><span>, </span><span style="color:#d69d85;">&quot;linker&quot;</span><span>: </span><span style="color:#d69d85;">&quot;rust-lld&quot;</span><span>, </span></code></pre> <p>Instead of using the platform’s default linker (which might not support Linux targets), we use the cross-platform <a href="https://lld.llvm.org/">LLD</a> linker that is shipped with Rust for linking our kernel.</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span style="color:#d69d85;">&quot;panic-strategy&quot;</span><span>: </span><span style="color:#d69d85;">&quot;abort&quot;</span><span>, </span></code></pre> <p>This setting specifies that the target doesn’t support <a href="https://www.bogotobogo.com/cplusplus/stackunwinding.php">stack unwinding</a> on panic, so instead the program should abort directly. This has the same effect as the <code>panic = "abort"</code> option in our Cargo.toml, so we can remove it from there. (Note that, in contrast to the Cargo.toml option, this target option also applies when we recompile the <code>core</code> library later in this post. So, even if you prefer to keep the Cargo.toml option, make sure to include this option.)</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span style="color:#d69d85;">&quot;disable-redzone&quot;</span><span>: </span><span style="color:#569cd6;">true</span><span>, </span></code></pre> <p>We’re writing a kernel, so we’ll need to handle interrupts at some point. To do that safely, we have to disable a certain stack pointer optimization called the <em>“red zone”</em>, because it would cause stack corruption otherwise. For more information, see our separate post about <a href="https://os.phil-opp.com/red-zone/">disabling the red zone</a>.</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span style="color:#d69d85;">&quot;features&quot;</span><span>: </span><span style="color:#d69d85;">&quot;-mmx,-sse,+soft-float&quot;</span><span>, </span></code></pre> <p>The <code>features</code> field enables/disables target features. We disable the <code>mmx</code> and <code>sse</code> features by prefixing them with a minus and enable the <code>soft-float</code> feature by prefixing it with a plus. Note that there must be no spaces between different flags, otherwise LLVM fails to interpret the features string.</p> <p>The <code>mmx</code> and <code>sse</code> features determine support for <a href="https://en.wikipedia.org/wiki/SIMD">Single Instruction Multiple Data (SIMD)</a> instructions, which can often speed up programs significantly. However, using the large SIMD registers in OS kernels leads to performance problems. The reason is that the kernel needs to restore all registers to their original state before continuing an interrupted program. This means that the kernel has to save the complete SIMD state to main memory on each system call or hardware interrupt. Since the SIMD state is very large (512–1600 bytes) and interrupts can occur very often, these additional save/restore operations considerably harm performance. To avoid this, we disable SIMD for our kernel (not for applications running on top!).</p> <p>A problem with disabling SIMD is that floating point operations on <code>x86_64</code> require SIMD registers by default. To solve this problem, we add the <code>soft-float</code> feature, which emulates all floating point operations through software functions based on normal integers.</p> <p>For more information, see our post on <a href="https://os.phil-opp.com/disable-simd/">disabling SIMD</a>.</p> <h4 id="putting-it-together"><a class="zola-anchor" href="#putting-it-together" aria-label="Anchor link for: putting-it-together">🔗</a>Putting it Together</h4> <p>Our target specification file now looks like this:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-none&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;data-layout&quot;</span><span>: </span><span style="color:#d69d85;">&quot;e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;arch&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-endian&quot;</span><span>: </span><span style="color:#d69d85;">&quot;little&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-pointer-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-c-int-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;32&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;os&quot;</span><span>: </span><span style="color:#d69d85;">&quot;none&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;executables&quot;</span><span>: </span><span style="color:#569cd6;">true</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;linker-flavor&quot;</span><span>: </span><span style="color:#d69d85;">&quot;ld.lld&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;linker&quot;</span><span>: </span><span style="color:#d69d85;">&quot;rust-lld&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;panic-strategy&quot;</span><span>: </span><span style="color:#d69d85;">&quot;abort&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;disable-redzone&quot;</span><span>: </span><span style="color:#569cd6;">true</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;features&quot;</span><span>: </span><span style="color:#d69d85;">&quot;-mmx,-sse,+soft-float&quot; </span><span>} </span></code></pre> <h3 id="building-our-kernel"><a class="zola-anchor" href="#building-our-kernel" aria-label="Anchor link for: building-our-kernel">🔗</a>Building our Kernel</h3> <p>Compiling for our new target will use Linux conventions, since the ld.lld linker-flavor instructs llvm to compile with the <code>-flavor gnu</code> flag (for more linker options, see <a href="https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-flavor">the rustc documentation</a>). This means that we need an entry point named <code>_start</code> as described in the <a href="https://os.phil-opp.com/freestanding-rust-binary/">previous post</a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/main.rs </span><span> </span><span>#![no_std] </span><span style="color:#608b4e;">// don&#39;t link the Rust standard library </span><span>#![no_main] </span><span style="color:#608b4e;">// disable all Rust-level entry points </span><span> </span><span style="color:#569cd6;">use </span><span>core::panic::PanicInfo; </span><span> </span><span style="color:#608b4e;">/// This function is called on panic. </span><span>#[panic_handler] </span><span style="color:#569cd6;">fn </span><span>panic(_info: </span><span style="color:#569cd6;">&amp;</span><span>PanicInfo) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span><span>#[no_mangle] </span><span style="color:#608b4e;">// don&#39;t mangle the name of this function </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#608b4e;">// this function is the entry point, since the linker looks for a function </span><span> </span><span style="color:#608b4e;">// named `_start` by default </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>Note that the entry point needs to be called <code>_start</code> regardless of your host OS.</p> <p>We can now build the kernel for our new target by passing the name of the JSON file as <code>--target</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo build --target x86_64-blog_os.json </span><span> </span><span>error[E0463]: can&#39;t find crate for `core` </span></code></pre> <p>It fails! The error tells us that the Rust compiler no longer finds the <a href="https://doc.rust-lang.org/nightly/core/index.html"><code>core</code> library</a>. This library contains basic Rust types such as <code>Result</code>, <code>Option</code>, and iterators, and is implicitly linked to all <code>no_std</code> crates.</p> <p>The problem is that the core library is distributed together with the Rust compiler as a <em>precompiled</em> library. So it is only valid for supported host triples (e.g., <code>x86_64-unknown-linux-gnu</code>) but not for our custom target. If we want to compile code for other targets, we need to recompile <code>core</code> for these targets first.</p> <h4 id="the-build-std-option"><a class="zola-anchor" href="#the-build-std-option" aria-label="Anchor link for: the-build-std-option">🔗</a>The <code>build-std</code> Option</h4> <p>That’s where the <a href="https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std"><code>build-std</code> feature</a> of cargo comes in. It allows to recompile <code>core</code> and other standard library crates on demand, instead of using the precompiled versions shipped with the Rust installation. This feature is very new and still not finished, so it is marked as “unstable” and only available on <a href="https://os.phil-opp.com/minimal-rust-kernel/#installing-rust-nightly">nightly Rust compilers</a>.</p> <p>To use the feature, we need to create a local <a href="https://doc.rust-lang.org/cargo/reference/config.html">cargo configuration</a> file at <code>.cargo/config.toml</code> (the <code>.cargo</code> folder should be next to your <code>src</code> folder) with the following content:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in .cargo/config.toml </span><span> </span><span>[</span><span style="color:#808080;">unstable</span><span>] </span><span style="color:#569cd6;">build-std </span><span>= [</span><span style="color:#d69d85;">&quot;core&quot;</span><span>, </span><span style="color:#d69d85;">&quot;compiler_builtins&quot;</span><span>] </span></code></pre> <p>This tells cargo that it should recompile the <code>core</code> and <code>compiler_builtins</code> libraries. The latter is required because it is a dependency of <code>core</code>. In order to recompile these libraries, cargo needs access to the rust source code, which we can install with <code>rustup component add rust-src</code>.</p> <div class="note"> <p><strong>Note:</strong> The <code>unstable.build-std</code> configuration key requires at least the Rust nightly from 2020-07-15.</p> </div> <p>After setting the <code>unstable.build-std</code> configuration key and installing the <code>rust-src</code> component, we can rerun our build command:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo build --target x86_64-blog_os.json </span><span> Compiling core v0.0.0 (/…/rust/src/libcore) </span><span> Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) </span><span> Compiling compiler_builtins v0.1.32 </span><span> Compiling blog_os v0.1.0 (/…/blog_os) </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs </span></code></pre> <p>We see that <code>cargo build</code> now recompiles the <code>core</code>, <code>rustc-std-workspace-core</code> (a dependency of <code>compiler_builtins</code>), and <code>compiler_builtins</code> libraries for our custom target.</p> <h4 id="memory-related-intrinsics"><a class="zola-anchor" href="#memory-related-intrinsics" aria-label="Anchor link for: memory-related-intrinsics">🔗</a>Memory-Related Intrinsics</h4> <p>The Rust compiler assumes that a certain set of built-in functions is available for all systems. Most of these functions are provided by the <code>compiler_builtins</code> crate that we just recompiled. However, there are some memory-related functions in that crate that are not enabled by default because they are normally provided by the C library on the system. These functions include <code>memset</code>, which sets all bytes in a memory block to a given value, <code>memcpy</code>, which copies one memory block to another, and <code>memcmp</code>, which compares two memory blocks. While we didn’t need any of these functions to compile our kernel right now, they will be required as soon as we add some more code to it (e.g. when copying structs around).</p> <p>Since we can’t link to the C library of the operating system, we need an alternative way to provide these functions to the compiler. One possible approach for this could be to implement our own <code>memset</code> etc. functions and apply the <code>#[no_mangle]</code> attribute to them (to avoid the automatic renaming during compilation). However, this is dangerous since the slightest mistake in the implementation of these functions could lead to undefined behavior. For example, implementing <code>memcpy</code> with a <code>for</code> loop may result in an infinite recursion because <code>for</code> loops implicitly call the <a href="https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter"><code>IntoIterator::into_iter</code></a> trait method, which may call <code>memcpy</code> again. So it’s a good idea to reuse existing, well-tested implementations instead.</p> <p>Fortunately, the <code>compiler_builtins</code> crate already contains implementations for all the needed functions, they are just disabled by default to not collide with the implementations from the C library. We can enable them by setting cargo’s <a href="https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features"><code>build-std-features</code></a> flag to <code>["compiler-builtins-mem"]</code>. Like the <code>build-std</code> flag, this flag can be either passed on the command line as a <code>-Z</code> flag or configured in the <code>unstable</code> table in the <code>.cargo/config.toml</code> file. Since we always want to build with this flag, the config file option makes more sense for us:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in .cargo/config.toml </span><span> </span><span>[</span><span style="color:#808080;">unstable</span><span>] </span><span style="color:#569cd6;">build-std-features </span><span>= [</span><span style="color:#d69d85;">&quot;compiler-builtins-mem&quot;</span><span>] </span><span style="color:#569cd6;">build-std </span><span>= [</span><span style="color:#d69d85;">&quot;core&quot;</span><span>, </span><span style="color:#d69d85;">&quot;compiler_builtins&quot;</span><span>] </span></code></pre> <p>(Support for the <code>compiler-builtins-mem</code> feature was only <a href="https://github.com/rust-lang/rust/pull/77284">added very recently</a>, so you need at least Rust nightly <code>2020-09-30</code> for it.)</p> <p>Behind the scenes, this flag enables the <a href="https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55"><code>mem</code> feature</a> of the <code>compiler_builtins</code> crate. The effect of this is that the <code>#[no_mangle]</code> attribute is applied to the <a href="https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69"><code>memcpy</code> etc. implementations</a> of the crate, which makes them available to the linker.</p> <p>With this change, our kernel has valid implementations for all compiler-required functions, so it will continue to compile even if our code gets more complex.</p> <h4 id="set-a-default-target"><a class="zola-anchor" href="#set-a-default-target" aria-label="Anchor link for: set-a-default-target">🔗</a>Set a Default Target</h4> <p>To avoid passing the <code>--target</code> parameter on every invocation of <code>cargo build</code>, we can override the default target. To do this, we add the following to our <a href="https://doc.rust-lang.org/cargo/reference/config.html">cargo configuration</a> file at <code>.cargo/config.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in .cargo/config.toml </span><span> </span><span>[</span><span style="color:#808080;">build</span><span>] </span><span style="color:#569cd6;">target </span><span>= </span><span style="color:#d69d85;">&quot;x86_64-blog_os.json&quot; </span></code></pre> <p>This tells <code>cargo</code> to use our <code>x86_64-blog_os.json</code> target when no explicit <code>--target</code> argument is passed. This means that we can now build our kernel with a simple <code>cargo build</code>. For more information on cargo configuration options, check out the <a href="https://doc.rust-lang.org/cargo/reference/config.html">official documentation</a>.</p> <p>We are now able to build our kernel for a bare metal target with a simple <code>cargo build</code>. However, our <code>_start</code> entry point, which will be called by the boot loader, is still empty. It’s time that we output something to screen from it.</p> <h3 id="printing-to-screen"><a class="zola-anchor" href="#printing-to-screen" aria-label="Anchor link for: printing-to-screen">🔗</a>Printing to Screen</h3> <p>The easiest way to print text to the screen at this stage is the <a href="https://en.wikipedia.org/wiki/VGA-compatible_text_mode">VGA text buffer</a>. It is a special memory area mapped to the VGA hardware that contains the contents displayed on screen. It normally consists of 25 lines that each contain 80 character cells. Each character cell displays an ASCII character with some foreground and background colors. The screen output looks like this:</p> <p><img src="https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png" alt="screen output for common ASCII characters" /></p> <p>We will discuss the exact layout of the VGA buffer in the next post, where we write a first small driver for it. For printing “Hello World!”, we just need to know that the buffer is located at address <code>0xb8000</code> and that each character cell consists of an ASCII byte and a color byte.</p> <p>The implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">HELLO</span><span>: </span><span style="color:#569cd6;">&amp;</span><span>[</span><span style="color:#569cd6;">u8</span><span>] = </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&quot;Hello World!&quot;</span><span>; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>_start() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> vga_buffer = </span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut u8</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">for </span><span>(i, </span><span style="color:#569cd6;">&amp;</span><span>byte) </span><span style="color:#569cd6;">in </span><span style="color:#b4cea8;">HELLO</span><span>.iter().enumerate() { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> *vga_buffer.offset(i </span><span style="color:#569cd6;">as isize </span><span>* </span><span style="color:#b5cea8;">2</span><span>) = byte; </span><span> *vga_buffer.offset(i </span><span style="color:#569cd6;">as isize </span><span>* </span><span style="color:#b5cea8;">2 </span><span>+ </span><span style="color:#b5cea8;">1</span><span>) = </span><span style="color:#b5cea8;">0xb</span><span>; </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>First, we cast the integer <code>0xb8000</code> into a <a href="https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html#dereferencing-a-raw-pointer">raw pointer</a>. Then we <a href="https://doc.rust-lang.org/stable/book/ch13-02-iterators.html">iterate</a> over the bytes of the <a href="https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime">static</a> <code>HELLO</code> <a href="https://doc.rust-lang.org/reference/tokens.html#byte-string-literals">byte string</a>. We use the <a href="https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate"><code>enumerate</code></a> method to additionally get a running variable <code>i</code>. In the body of the for loop, we use the <a href="https://doc.rust-lang.org/std/primitive.pointer.html#method.offset"><code>offset</code></a> method to write the string byte and the corresponding color byte (<code>0xb</code> is a light cyan).</p> <p>Note that there’s an <a href="https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html"><code>unsafe</code></a> block around all memory writes. The reason is that the Rust compiler can’t prove that the raw pointers we create are valid. They could point anywhere and lead to data corruption. By putting them into an <code>unsafe</code> block, we’re basically telling the compiler that we are absolutely sure that the operations are valid. Note that an <code>unsafe</code> block does not turn off Rust’s safety checks. It only allows you to do <a href="https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html#unsafe-superpowers">five additional things</a>.</p> <p>I want to emphasize that <strong>this is not the way we want to do things in Rust!</strong> It’s very easy to mess up when working with raw pointers inside unsafe blocks. For example, we could easily write beyond the buffer’s end if we’re not careful.</p> <p>So we want to minimize the use of <code>unsafe</code> as much as possible. Rust gives us the ability to do this by creating safe abstractions. For example, we could create a VGA buffer type that encapsulates all unsafety and ensures that it is <em>impossible</em> to do anything wrong from the outside. This way, we would only need minimal amounts of <code>unsafe</code> code and can be sure that we don’t violate <a href="https://en.wikipedia.org/wiki/Memory_safety">memory safety</a>. We will create such a safe VGA buffer abstraction in the next post.</p> <h2 id="running-our-kernel"><a class="zola-anchor" href="#running-our-kernel" aria-label="Anchor link for: running-our-kernel">🔗</a>Running our Kernel</h2> <p>Now that we have an executable that does something perceptible, it is time to run it. First, we need to turn our compiled kernel into a bootable disk image by linking it with a bootloader. Then we can run the disk image in the <a href="https://www.qemu.org/">QEMU</a> virtual machine or boot it on real hardware using a USB stick.</p> <h3 id="creating-a-bootimage"><a class="zola-anchor" href="#creating-a-bootimage" aria-label="Anchor link for: creating-a-bootimage">🔗</a>Creating a Bootimage</h3> <p>To turn our compiled kernel into a bootable disk image, we need to link it with a bootloader. As we learned in the <a href="https://os.phil-opp.com/minimal-rust-kernel/#the-boot-process">section about booting</a>, the bootloader is responsible for initializing the CPU and loading our kernel.</p> <p>Instead of writing our own bootloader, which is a project on its own, we use the <a href="https://crates.io/crates/bootloader"><code>bootloader</code></a> crate. This crate implements a basic BIOS bootloader without any C dependencies, just Rust and inline assembly. To use it for booting our kernel, we need to add a dependency on it:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">bootloader </span><span>= </span><span style="color:#d69d85;">&quot;0.9&quot; </span></code></pre> <p><strong>Note:</strong> This post is only compatible with <code>bootloader v0.9</code>. Newer versions use a different build system and will result in build errors when following this post.</p> <p>Adding the bootloader as a dependency is not enough to actually create a bootable disk image. The problem is that we need to link our kernel with the bootloader after compilation, but cargo has no support for <a href="https://github.com/rust-lang/cargo/issues/545">post-build scripts</a>.</p> <p>To solve this problem, we created a tool named <code>bootimage</code> that first compiles the kernel and bootloader, and then links them together to create a bootable disk image. To install the tool, go into your home directory (or any directory outside of your cargo project) and execute the following command in your terminal:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo install bootimage </span></code></pre> <p>For running <code>bootimage</code> and building the bootloader, you need to have the <code>llvm-tools-preview</code> rustup component installed. You can do so by executing <code>rustup component add llvm-tools-preview</code>.</p> <p>After installing <code>bootimage</code> and adding the <code>llvm-tools-preview</code> component, you can create a bootable disk image by going back into your cargo project directory and executing:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo bootimage </span></code></pre> <p>We see that the tool recompiles our kernel using <code>cargo build</code>, so it will automatically pick up any changes you make. Afterwards, it compiles the bootloader, which might take a while. Like all crate dependencies, it is only built once and then cached, so subsequent builds will be much faster. Finally, <code>bootimage</code> combines the bootloader and your kernel into a bootable disk image.</p> <p>After executing the command, you should see a bootable disk image named <code>bootimage-blog_os.bin</code> in your <code>target/x86_64-blog_os/debug</code> directory. You can boot it in a virtual machine or copy it to a USB drive to boot it on real hardware. (Note that this is not a CD image, which has a different format, so burning it to a CD doesn’t work).</p> <h4 id="how-does-it-work"><a class="zola-anchor" href="#how-does-it-work" aria-label="Anchor link for: how-does-it-work">🔗</a>How does it work?</h4> <p>The <code>bootimage</code> tool performs the following steps behind the scenes:</p> <ul> <li>It compiles our kernel to an <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF</a> file.</li> <li>It compiles the bootloader dependency as a standalone executable.</li> <li>It links the bytes of the kernel ELF file to the bootloader.</li> </ul> <p>When booted, the bootloader reads and parses the appended ELF file. It then maps the program segments to virtual addresses in the page tables, zeroes the <code>.bss</code> section, and sets up a stack. Finally, it reads the entry point address (our <code>_start</code> function) and jumps to it.</p> <h3 id="booting-it-in-qemu"><a class="zola-anchor" href="#booting-it-in-qemu" aria-label="Anchor link for: booting-it-in-qemu">🔗</a>Booting it in QEMU</h3> <p>We can now boot the disk image in a virtual machine. To boot it in <a href="https://www.qemu.org/">QEMU</a>, execute the following command:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin </span></code></pre> <p>This opens a separate window which should look similar to this:</p> <p><img src="https://os.phil-opp.com/minimal-rust-kernel/qemu.png" alt="QEMU showing “Hello World!”" /></p> <p>We see that our “Hello World!” is visible on the screen.</p> <h3 id="real-machine"><a class="zola-anchor" href="#real-machine" aria-label="Anchor link for: real-machine">🔗</a>Real Machine</h3> <p>It is also possible to write it to a USB stick and boot it on a real machine, <strong>but be careful</strong> to choose the correct device name, because <strong>everything on that device is overwritten</strong>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX &amp;&amp; sync </span></code></pre> <p>Where <code>sdX</code> is the device name of your USB stick.</p> <p>After writing the image to the USB stick, you can run it on real hardware by booting from it. You probably need to use a special boot menu or change the boot order in your BIOS configuration to boot from the USB stick. Note that it currently doesn’t work for UEFI machines, since the <code>bootloader</code> crate has no UEFI support yet.</p> <h3 id="using-cargo-run"><a class="zola-anchor" href="#using-cargo-run" aria-label="Anchor link for: using-cargo-run">🔗</a>Using <code>cargo run</code></h3> <p>To make it easier to run our kernel in QEMU, we can set the <code>runner</code> configuration key for cargo:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in .cargo/config.toml </span><span> </span><span>[</span><span style="color:#808080;">target.</span><span style="color:#d69d85;">&#39;cfg(target_os = &quot;none&quot;)&#39;</span><span>] </span><span style="color:#569cd6;">runner </span><span>= </span><span style="color:#d69d85;">&quot;bootimage runner&quot; </span></code></pre> <p>The <code>target.'cfg(target_os = "none")'</code> table applies to all targets whose target configuration file’s <code>"os"</code> field is set to <code>"none"</code>. This includes our <code>x86_64-blog_os.json</code> target. The <code>runner</code> key specifies the command that should be invoked for <code>cargo run</code>. The command is run after a successful build with the executable path passed as the first argument. See the <a href="https://doc.rust-lang.org/cargo/reference/config.html">cargo documentation</a> for more details.</p> <p>The <code>bootimage runner</code> command is specifically designed to be usable as a <code>runner</code> executable. It links the given executable with the project’s bootloader dependency and then launches QEMU. See the <a href="https://github.com/rust-osdev/bootimage">Readme of <code>bootimage</code></a> for more details and possible configuration options.</p> <p>Now we can use <code>cargo run</code> to compile our kernel and boot it in QEMU.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>In the next post, we will explore the VGA text buffer in more detail and write a safe interface for it. We will also add support for the <code>println</code> macro.</p> Handling Exceptions Sun, 26 Mar 2017 00:00:00 +0000 https://os.phil-opp.com/handling-exceptions/ https://os.phil-opp.com/handling-exceptions/ <p>In this post, we start exploring CPU exceptions. Exceptions occur in various erroneous situations, for example when accessing an invalid memory address or when dividing by zero. To catch them, we have to set up an <em>interrupt descriptor table</em> that provides handler functions. At the end of this post, our kernel will be able to catch <a href="https://wiki.osdev.org/Exceptions#Breakpoint">breakpoint exceptions</a> and to resume normal execution afterwards.</p> <span id="continue-reading"></span> <p>As always, the complete source code is available on <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_9">GitHub</a>. Please file <a href="https://github.com/phil-opp/blog_os/issues">issues</a> for any problems, questions, or improvement suggestions. There is also a comment section at the end of this page.</p> <h2 id="exceptions"><a class="zola-anchor" href="#exceptions" aria-label="Anchor link for: exceptions">🔗</a>Exceptions</h2> <p>An exception signals that something is wrong with the current instruction. For example, the CPU issues an exception if the current instruction tries to divide by 0. When an exception occurs, the CPU interrupts its current work and immediately calls a specific exception handler function, depending on the exception type.</p> <p>We’ve already seen several types of exceptions in our kernel:</p> <ul> <li><strong>Invalid Opcode</strong>: This exception occurs when the current instruction is invalid. For example, this exception occurred when we tried to use SSE instructions before enabling SSE. Without SSE, the CPU didn’t know the <code>movups</code> and <code>movaps</code> instructions, so it throws an exception when it stumbles over them.</li> <li><strong>Page Fault</strong>: A page fault occurs on illegal memory accesses. For example, if the current instruction tries to read from an unmapped page or tries to write to a read-only page.</li> <li><strong>Double Fault</strong>: When an exception occurs, the CPU tries to call the corresponding handler function. If another exception occurs <em>while calling the exception handler</em>, the CPU raises a double fault exception. This exception also occurs when there is no handler function registered for an exception.</li> <li><strong>Triple Fault</strong>: If an exception occurs while the CPU tries to call the double fault handler function, it issues a fatal <em>triple fault</em>. We can’t catch or handle a triple fault. Most processors react by resetting themselves and rebooting the operating system. This causes the bootloops we experienced in the previous posts.</li> </ul> <p>For the full list of exceptions check out the <a href="https://wiki.osdev.org/Exceptions">OSDev wiki</a>.</p> <h3 id="the-interrupt-descriptor-table"><a class="zola-anchor" href="#the-interrupt-descriptor-table" aria-label="Anchor link for: the-interrupt-descriptor-table">🔗</a>The Interrupt Descriptor Table</h3> <p>In order to catch and handle exceptions, we have to set up a so-called <em>Interrupt Descriptor Table</em> (IDT). In this table we can specify a handler function for each CPU exception. The hardware uses this table directly, so we need to follow a predefined format. Each entry must have the following 16-byte structure:</p> <table><thead><tr><th>Type</th><th>Name</th><th>Description</th></tr></thead><tbody> <tr><td>u16</td><td>Function Pointer [0:15]</td><td>The lower bits of the pointer to the handler function.</td></tr> <tr><td>u16</td><td>GDT selector</td><td>Selector of a code segment in the GDT.</td></tr> <tr><td>u16</td><td>Options</td><td>(see below)</td></tr> <tr><td>u16</td><td>Function Pointer [16:31]</td><td>The middle bits of the pointer to the handler function.</td></tr> <tr><td>u32</td><td>Function Pointer [32:63]</td><td>The remaining bits of the pointer to the handler function.</td></tr> <tr><td>u32</td><td>Reserved</td><td></td></tr> </tbody></table> <p>The options field has the following format:</p> <table><thead><tr><th>Bits</th><th>Name</th><th>Description</th></tr></thead><tbody> <tr><td>0-2</td><td>Interrupt Stack Table Index</td><td>0: Don’t switch stacks, 1-7: Switch to the n-th stack in the Interrupt Stack Table when this handler is called.</td></tr> <tr><td>3-7</td><td>Reserved</td><td></td></tr> <tr><td>8</td><td>0: Interrupt Gate, 1: Trap Gate</td><td>If this bit is 0, interrupts are disabled when this handler is called.</td></tr> <tr><td>9-11</td><td>must be one</td><td></td></tr> <tr><td>12</td><td>must be zero</td><td></td></tr> <tr><td>13‑14</td><td>Descriptor Privilege Level (DPL)</td><td>The minimal privilege level required for calling this handler.</td></tr> <tr><td>15</td><td>Present</td><td></td></tr> </tbody></table> <p>Each exception has a predefined IDT index. For example the invalid opcode exception has table index 6 and the page fault exception has table index 14. Thus, the hardware can automatically load the corresponding IDT entry for each exception. The <a href="https://wiki.osdev.org/Exceptions">Exception Table</a> in the OSDev wiki shows the IDT indexes of all exceptions in the “Vector nr.” column.</p> <p>When an exception occurs, the CPU roughly does the following:</p> <ol> <li>Push some registers on the stack, including the instruction pointer and the <a href="https://en.wikipedia.org/wiki/FLAGS_register">RFLAGS</a> register. (We will use these values later in this post.)</li> <li>Read the corresponding entry from the Interrupt Descriptor Table (IDT). For example, the CPU reads the 14-th entry when a page fault occurs.</li> <li>Check if the entry is present. Raise a double fault if not.</li> <li>Disable interrupts if the entry is an interrupt gate (bit 40 not set).</li> <li>Load the specified GDT selector into the CS segment.</li> <li>Jump to the specified handler function.</li> </ol> <h2 id="an-idt-type"><a class="zola-anchor" href="#an-idt-type" aria-label="Anchor link for: an-idt-type">🔗</a>An IDT Type</h2> <p>Instead of creating our own IDT type, we will use the <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.Idt.html"><code>Idt</code> struct</a> of the <code>x86_64</code> crate, which looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[repr(C)] </span><span style="color:#569cd6;">pub struct </span><span>Idt { </span><span> </span><span style="color:#569cd6;">pub </span><span>divide_by_zero: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>debug: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>non_maskable_interrupt: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>breakpoint: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>overflow: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>bound_range_exceeded: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>invalid_opcode: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>device_not_available: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>double_fault: IdtEntry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>invalid_tss: IdtEntry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>segment_not_present: IdtEntry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>stack_segment_fault: IdtEntry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>general_protection_fault: IdtEntry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>page_fault: IdtEntry&lt;PageFaultHandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>x87_floating_point: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>alignment_check: IdtEntry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>machine_check: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>simd_floating_point: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>virtualization: IdtEntry&lt;HandlerFunc&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>security_exception: IdtEntry&lt;HandlerFuncWithErrCode&gt;, </span><span> </span><span style="color:#569cd6;">pub </span><span>interrupts: [IdtEntry&lt;HandlerFunc&gt;; 224], </span><span> </span><span style="color:#608b4e;">// some fields omitted </span><span>} </span></code></pre> <p>The fields have the type <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.IdtEntry.html"><code>IdtEntry&lt;F&gt;</code></a>, which is a struct that represents the fields of an IDT entry (see the table above). The type parameter <code>F</code> defines the expected handler function type. We see that some entries require a <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/type.HandlerFunc.html"><code>HandlerFunc</code></a> and some entries require a <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/type.HandlerFuncWithErrCode.html"><code>HandlerFuncWithErrCode</code></a>. The page fault even has its own special type: <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/type.PageFaultHandlerFunc.html"><code>PageFaultHandlerFunc</code></a>.</p> <p>Let’s look at the <code>HandlerFunc</code> type first:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">HandlerFunc </span><span>= </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn</span><span>(</span><span style="color:#569cd6;">_</span><span>: </span><span style="color:#569cd6;">&amp;mut</span><span> ExceptionStackFrame); </span></code></pre> <p>It’s a <a href="https://doc.rust-lang.org/book/type-aliases.html">type alias</a> for an <code>extern "x86-interrupt" fn</code> type. The <code>extern</code> keyword defines a function with a <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/ffi.html#foreign-calling-conventions">foreign calling convention</a> and is often used to communicate with C code (<code>extern "C" fn</code>). But what is the <code>x86-interrupt</code> calling convention?</p> <h2 id="the-interrupt-calling-convention"><a class="zola-anchor" href="#the-interrupt-calling-convention" aria-label="Anchor link for: the-interrupt-calling-convention">🔗</a>The Interrupt Calling Convention</h2> <p>Exceptions are quite similar to function calls: The CPU jumps to the first instruction of the called function and executes it. Afterwards, if the function is not diverging, the CPU jumps to the return address and continues the execution of the parent function.</p> <p>However, there is a major difference between exceptions and function calls: A function call is invoked voluntary by a compiler inserted <code>call</code> instruction, while an exception might occur at <em>any</em> instruction. In order to understand the consequences of this difference, we need to examine function calls in more detail.</p> <p><a href="https://en.wikipedia.org/wiki/Calling_convention">Calling conventions</a> specify the details of a function call. For example, they specify where function parameters are placed (e.g. in registers or on the stack) and how results are returned. On x86_64 Linux, the following rules apply for C functions (specified in the <a href="https://refspecs.linuxbase.org/elf/gabi41.pdf">System V ABI</a>):</p> <ul> <li>the first six integer arguments are passed in registers <code>rdi</code>, <code>rsi</code>, <code>rdx</code>, <code>rcx</code>, <code>r8</code>, <code>r9</code></li> <li>additional arguments are passed on the stack</li> <li>results are returned in <code>rax</code> and <code>rdx</code></li> </ul> <p>Note that Rust does not follow the C ABI (in fact, <a href="https://github.com/rust-lang/rfcs/issues/600">there isn’t even a Rust ABI yet</a>). So these rules apply only to functions declared as <code>extern "C" fn</code>.</p> <h3 id="preserved-and-scratch-registers"><a class="zola-anchor" href="#preserved-and-scratch-registers" aria-label="Anchor link for: preserved-and-scratch-registers">🔗</a>Preserved and Scratch Registers</h3> <p>The calling convention divides the registers in two parts: <em>preserved</em> and <em>scratch</em> registers.</p> <p>The values of <em>preserved</em> registers must remain unchanged across function calls. So a called function (the <em>“callee”</em>) is only allowed to overwrite these registers if it restores their original values before returning. Therefore these registers are called <em>“callee-saved”</em>. A common pattern is to save these registers to the stack at the function’s beginning and restore them just before returning.</p> <p>In contrast, a called function is allowed to overwrite <em>scratch</em> registers without restrictions. If the caller wants to preserve the value of a scratch register across a function call, it needs to backup and restore it before the function call (e.g. by pushing it to the stack). So the scratch registers are <em>caller-saved</em>.</p> <p>On x86_64, the C calling convention specifies the following preserved and scratch registers:</p> <table><thead><tr><th>preserved registers</th><th>scratch registers</th></tr></thead><tbody> <tr><td><code>rbp</code>, <code>rbx</code>, <code>rsp</code>, <code>r12</code>, <code>r13</code>, <code>r14</code>, <code>r15</code></td><td><code>rax</code>, <code>rcx</code>, <code>rdx</code>, <code>rsi</code>, <code>rdi</code>, <code>r8</code>, <code>r9</code>, <code>r10</code>, <code>r11</code></td></tr> <tr><td><em>callee-saved</em></td><td><em>caller-saved</em></td></tr> </tbody></table> <p>The compiler knows these rules, so it generates the code accordingly. For example, most functions begin with a <code>push rbp</code>, which backups <code>rbp</code> on the stack (because it’s a callee-saved register).</p> <h3 id="preserving-all-registers"><a class="zola-anchor" href="#preserving-all-registers" aria-label="Anchor link for: preserving-all-registers">🔗</a>Preserving all Registers</h3> <p>In contrast to function calls, exceptions can occur on <em>any</em> instruction. In most cases we don’t even know at compile time if the generated code will cause an exception. For example, the compiler can’t know if an instruction causes a stack overflow or a page fault.</p> <p>Since we don’t know when an exception occurs, we can’t backup any registers before. This means that we can’t use a calling convention that relies on caller-saved registers for exception handlers. Instead, we need a calling convention means that preserves <em>all registers</em>. The <code>x86-interrupt</code> calling convention is such a calling convention, so it guarantees that all register values are restored to their original values on function return.</p> <h3 id="the-exception-stack-frame"><a class="zola-anchor" href="#the-exception-stack-frame" aria-label="Anchor link for: the-exception-stack-frame">🔗</a>The Exception Stack Frame</h3> <p>On a normal function call (using the <code>call</code> instruction), the CPU pushes the return address before jumping to the target function. On function return (using the <code>ret</code> instruction), the CPU pops this return address and jumps to it. So the stack frame of a normal function call looks like this:</p> <p><img src="https://os.phil-opp.com/handling-exceptions/function-stack-frame.svg" alt="function stack frame" /></p> <p>For exception and interrupt handlers, however, pushing a return address would not suffice, since interrupt handlers often run in a different context (stack pointer, CPU flags, etc.). Instead, the CPU performs the following steps when an interrupt occurs:</p> <ol> <li><strong>Aligning the stack pointer</strong>: An interrupt can occur at any instructions, so the stack pointer can have any value, too. However, some CPU instructions (e.g. some SSE instructions) require that the stack pointer is aligned on a 16 byte boundary, therefore the CPU performs such an alignment right after the interrupt.</li> <li><strong>Switching stacks</strong> (in some cases): A stack switch occurs when the CPU privilege level changes, for example when a CPU exception occurs in an user mode program. It is also possible to configure stack switches for specific interrupts using the so-called <em>Interrupt Stack Table</em> (described in the next post).</li> <li><strong>Pushing the old stack pointer</strong>: The CPU pushes the values of the stack pointer (<code>rsp</code>) and the stack segment (<code>ss</code>) registers at the time when the interrupt occurred (before the alignment). This makes it possible to restore the original stack pointer when returning from an interrupt handler.</li> <li><strong>Pushing and updating the <code>RFLAGS</code> register</strong>: The <a href="https://en.wikipedia.org/wiki/FLAGS_register"><code>RFLAGS</code></a> register contains various control and status bits. On interrupt entry, the CPU changes some bits and pushes the old value.</li> <li><strong>Pushing the instruction pointer</strong>: Before jumping to the interrupt handler function, the CPU pushes the instruction pointer (<code>rip</code>) and the code segment (<code>cs</code>). This is comparable to the return address push of a normal function call.</li> <li><strong>Pushing an error code</strong> (for some exceptions): For some specific exceptions such as page faults, the CPU pushes an error code, which describes the cause of the exception.</li> <li><strong>Invoking the interrupt handler</strong>: The CPU reads the address and the segment descriptor of the interrupt handler function from the corresponding field in the IDT. It then invokes this handler by loading the values into the <code>rip</code> and <code>cs</code> registers.</li> </ol> <p>So the <em>exception stack frame</em> looks like this:</p> <p><img src="https://os.phil-opp.com/handling-exceptions/exception-stack-frame.svg" alt="exception stack frame" /></p> <p>In the <code>x86_64</code> crate, the exception stack frame is represented by the <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.ExceptionStackFrame.html"><code>ExceptionStackFrame</code></a> struct. It is passed to interrupt handlers as <code>&amp;mut</code> and can be used to retrieve additional information about the exception’s cause. The struct contains no error code field, since only some few exceptions push an error code. These exceptions use the separate <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/type.HandlerFuncWithErrCode.html"><code>HandlerFuncWithErrCode</code></a> function type, which has an additional <code>error_code</code> argument.</p> <h3 id="behind-the-scenes"><a class="zola-anchor" href="#behind-the-scenes" aria-label="Anchor link for: behind-the-scenes">🔗</a>Behind the Scenes</h3> <p>The <code>x86-interrupt</code> calling convention is a powerful abstraction that hides almost all of the messy details of the exception handling process. However, sometimes it’s useful to know what’s happening behind the curtain. Here is a short overview of the things that the <code>x86-interrupt</code> calling convention takes care of:</p> <ul> <li><strong>Retrieving the arguments</strong>: Most calling conventions expect that the arguments are passed in registers. This is not possible for exception handlers, since we must not overwrite any register values before backing them up on the stack. Instead, the <code>x86-interrupt</code> calling convention is aware that the arguments already lie on the stack at a specific offset.</li> <li><strong>Returning using <code>iretq</code></strong>: Since the exception stack frame completely differs from stack frames of normal function calls, we can’t return from handlers functions through the normal <code>ret</code> instruction. Instead, the <code>iretq</code> instruction must be used.</li> <li><strong>Handling the error code</strong>: The error code, which is pushed for some exceptions, makes things much more complex. It changes the stack alignment (see the next point) and needs to be popped off the stack before returning. The <code>x86-interrupt</code> calling convention handles all that complexity. However, it doesn’t know which handler function is used for which exception, so it needs to deduce that information from the number of function arguments. That means that the programmer is still responsible to use the correct function type for each exception. Luckily, the <code>Idt</code> type defined by the <code>x86_64</code> crate ensures that the correct function types are used.</li> <li><strong>Aligning the stack</strong>: There are some instructions (especially SSE instructions) that require a 16-byte stack alignment. The CPU ensures this alignment whenever an exception occurs, but for some exceptions it destroys it again later when it pushes an error code. The <code>x86-interrupt</code> calling convention takes care of this by realigning the stack in this case.</li> </ul> <p>If you are interested in more details: We also have a series of posts that explains exception handling using <a href="https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md">naked functions</a> linked <a href="https://os.phil-opp.com/handling-exceptions/#too-much-magic">at the end of this post</a>.</p> <h2 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h2> <p>Now that we’ve understood the theory, it’s time to handle CPU exceptions in our kernel. We start by creating a new <code>interrupts</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">... </span><span style="color:#569cd6;">mod </span><span>interrupts; </span><span style="color:#569cd6;">... </span></code></pre> <p>In the new module, we create an <code>init</code> function, that creates a new <code>Idt</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::idt::Idt; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = Idt::new(); </span><span>} </span></code></pre> <p>Now we can add handler functions. We start by adding a handler for the <a href="https://wiki.osdev.org/Exceptions#Breakpoint">breakpoint exception</a>. The breakpoint exception is the perfect exception to test exception handling. Its only purpose is to temporary pause a program when the breakpoint instruction <code>int3</code> is executed.</p> <p>The breakpoint exception is commonly used in debuggers: When the user sets a breakpoint, the debugger overwrites the corresponding instruction with the <code>int3</code> instruction so that the CPU throws the breakpoint exception when it reaches that line. When the user wants to continue the program, the debugger replaces the <code>int3</code> instruction with the original instruction again and continues the program. For more details, see the <a href="https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints">“<em>How debuggers work</em>”</a> series.</p> <p>For our use case, we don’t need to overwrite any instructions (it wouldn’t even be possible since we <a href="https://os.phil-opp.com/remap-the-kernel/#using-the-correct-flags">set the page table flags</a> to read-only). Instead, we just want to print a message when the breakpoint instruction is executed and then continue the program.</p> <p>So let’s create a simple <code>breakpoint_handler</code> function and add it to our IDT:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">/// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::idt::ExceptionStackFrame; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = Idt::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span>} </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>breakpoint_handler( </span><span> stack_frame: </span><span style="color:#569cd6;">&amp;mut</span><span> ExceptionStackFrame) </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;EXCEPTION: BREAKPOINT</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, stack_frame); </span><span>} </span></code></pre> <p>Our handler just outputs a message and pretty-prints the exception stack frame.</p> <p>When we try to compile it, the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: x86-interrupt ABI is experimental and subject to change (see issue #40180) </span><span> --&gt; src/interrupts.rs:8:1 </span><span> | </span><span>8 | extern &quot;x86-interrupt&quot; fn breakpoint_handler( </span><span> | _^ starting here... </span><span>9 | | stack_frame: &amp;mut ExceptionStackFrame) </span><span>10 | | { </span><span>11 | | println!(&quot;EXCEPTION: BREAKPOINT\n{:#?}&quot;, stack_frame); </span><span>12 | | } </span><span> | |_^ ...ending here </span><span> | </span><span> = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable </span></code></pre> <p>This error occurs because the <code>x86-interrupt</code> calling convention is still unstable. To use it anyway, we have to explicitly enable it by adding <code>#![feature(abi_x86_interrupt)]</code> on the top of our <code>lib.rs</code>.</p> <h3 id="loading-the-idt"><a class="zola-anchor" href="#loading-the-idt" aria-label="Anchor link for: loading-the-idt">🔗</a>Loading the IDT</h3> <p>In order that the CPU uses our new interrupt descriptor table, we need to load it using the <a href="https://www.felixcloutier.com/x86/lgdt:lidt"><code>lidt</code></a> instruction. The <code>Idt</code> struct of the <code>x86_64</code> provides a <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.Idt.html#method.load"><code>load</code></a> method function for that. Let’s try to use it:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = Idt::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> idt.load(); </span><span>} </span></code></pre> <p>When we try to compile it now, the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: `idt` does not live long enough </span><span> --&gt; src/interrupts/mod.rs:43:5 </span><span> | </span><span>43 | idt.load(); </span><span> | ^^^ does not live long enough </span><span>44 | } </span><span> | - borrowed value only lives until here </span><span> | </span><span> = note: borrowed value must be valid for the static lifetime... </span></code></pre> <p>So the <code>load</code> methods expects a <code>&amp;'static self</code>, that is a reference that is valid for the complete runtime of the program. The reason is that the CPU will access this table on every interrupt until we load a different IDT. So using a shorter lifetime than <code>'static</code> could lead to use-after-free bugs.</p> <p>In fact, this is exactly what happens here. Our <code>idt</code> is created on the stack, so it is only valid inside the <code>init</code> function. Afterwards the stack memory is reused for other functions, so the CPU would interpret random stack memory as IDT. Luckily, the <code>Idt::load</code> method encodes this lifetime requirement in its function definition, so that the Rust compiler is able to prevent this possible bug at compile time.</p> <p>In order to fix this problem, we need to store our <code>idt</code> at a place where it has a <code>'static</code> lifetime. To achieve this, we could either allocate our IDT on the heap using <code>Box</code> and then convert it to a <code>'static</code> reference or we can store the IDT as a <code>static</code>. Let’s try the latter:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">IDT</span><span>: Idt = Idt::new(); </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.breakpoint.set_handler_fn(breakpoint_handler); </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>There are two problems with this. First, statics are immutable, so we can’t modify the breakpoint entry from our <code>init</code> function. Second, the <code>Idt::new</code> function is not a <a href="https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md"><code>const</code> function</a>, so it can’t be used to initialize a <code>static</code>. We could solve this problem by using a <a href="https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable"><code>static mut</code></a> of type <code>Option&lt;Idt&gt;</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">static mut </span><span style="color:#b4cea8;">IDT</span><span>: Option&lt;Idt&gt; = None; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#569cd6;">let </span><span style="color:#b4cea8;">IDT </span><span>= Some(Idt::new()); </span><span> </span><span style="color:#569cd6;">let</span><span> idt = </span><span style="color:#b4cea8;">IDT</span><span>.as_mut_ref().unwrap(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> idt.load(); </span><span> } </span><span>} </span></code></pre> <p>This variant compiles without errors but it’s far from idiomatic. <code>static mut</code>s are very prone to data races, so we need an <a href="https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers"><code>unsafe</code> block</a> on each access. Also, we need to explicitly <code>unwrap</code> the <code>IDT</code> on each use, since might be <code>None</code>.</p> <h4 id="lazy-statics-to-the-rescue"><a class="zola-anchor" href="#lazy-statics-to-the-rescue" aria-label="Anchor link for: lazy-statics-to-the-rescue">🔗</a>Lazy Statics to the Rescue</h4> <p>The one-time initialization of statics with non-const functions is a common problem in Rust. Fortunately, there already exists a good solution in a crate named <a href="https://docs.rs/lazy_static/0.2.4/lazy_static/">lazy_static</a>. This crate provides a <code>lazy_static!</code> macro that defines a lazily initialized <code>static</code>. Instead of computing its value at compile time, the <code>static</code> laziliy initializes itself when it’s accessed the first time. Thus, the initialization happens at runtime so that arbitrarily complex initialization code is possible.</p> <p>Let’s add the <code>lazy_static</code> crate to our project:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[macro_use] </span><span style="color:#569cd6;">extern crate</span><span> lazy_static; </span></code></pre> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies.lazy_static</span><span>] </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.2.4&quot; </span><span style="color:#569cd6;">features </span><span>= [</span><span style="color:#d69d85;">&quot;spin_no_std&quot;</span><span>] </span></code></pre> <p>We need the <code>spin_no_std</code> feature, since we don’t link the standard library. We also need the <code>#[macro_use]</code> attribute on the <code>extern crate</code> line to import the <code>lazy_static!</code> macro.</p> <p>Now we can create our static IDT using <code>lazy_static</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = Idt::new(); </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>Note how this solution requires no <code>unsafe</code> blocks or <code>unwrap</code> calls.</p> <blockquote> <h5 id="aside-how-does-the-lazy-static-macro-work"><a class="zola-anchor" href="#aside-how-does-the-lazy-static-macro-work" aria-label="Anchor link for: aside-how-does-the-lazy-static-macro-work">🔗</a>Aside: How does the <code>lazy_static!</code> macro work?</h5> <p>The macro generates a <code>static</code> of type <code>Once&lt;Idt&gt;</code>. The <a href="https://docs.rs/spin/0.4.5/spin/struct.Once.html"><code>Once</code></a> type is provided by the <code>spin</code> crate and allows deferred one-time initialization. It is implemented using an <a href="https://doc.rust-lang.org/nightly/core/sync/atomic/struct.AtomicUsize.html"><code>AtomicUsize</code></a> for synchronization and an <a href="https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html"><code>UnsafeCell</code></a> for storing the (possibly uninitialized) value. So this solution also uses <code>unsafe</code> behind the scenes, but it is abstracted away in a safe interface.</p> </blockquote> <h3 id="testing-it"><a class="zola-anchor" href="#testing-it" aria-label="Anchor link for: testing-it">🔗</a>Testing it</h3> <p>Now we should be able to handle breakpoint exceptions! Let’s try it in our <code>rust_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(...) { </span><span> </span><span style="color:#569cd6;">... </span><span> memory::init(boot_info); </span><span> </span><span> </span><span style="color:#608b4e;">// initialize our IDT </span><span> interrupts::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// invoke a breakpoint exception </span><span> x86_64::instructions::interrupts::int3(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When we run it in QEMU now (using <code>make run</code>), we see the following:</p> <p><img src="https://os.phil-opp.com/handling-exceptions/qemu-breakpoint-exception.png" alt="QEMU printing EXCEPTION: BREAKPOINT and the exception stack frame" /></p> <p>It works! The CPU successfully invokes our breakpoint handler, which prints the message, and then returns back to the <code>rust_main</code> function, where the <code>It did not crash!</code> message is printed.</p> <blockquote> <p><strong>Aside</strong>: If it doesn’t work and a boot loop occurs, this might be caused by a kernel stack overflow. Try increasing the stack size to at least 16kB (4096 * 4 bytes) in the <code>boot.asm</code> file.</p> </blockquote> <p>We see that the exception stack frame tells us the instruction and stack pointers at the time when the exception occurred. This information is very useful when debugging unexpected exceptions. For example, we can look at the corresponding assembly line using <code>objdump</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; objdump -d build/kernel-x86_64.bin | grep -B5 &quot;1140a6:&quot; </span><span>00000000001140a0 &lt;x86_64::instructions::interrupts::int3::h015bf61815bb8afe&gt;: </span><span> 1140a0: 55 push %rbp </span><span> 1140a1: 48 89 e5 mov %rsp,%rbp </span><span> 1140a4: 50 push %rax </span><span> 1140a5: cc int3 </span><span> 1140a6: 48 83 c4 08 add $0x8,%rsp </span></code></pre> <p>The <code>-d</code> flags disassembles the <code>code</code> section and <code>-C</code> flag makes function names more readable by <a href="https://en.wikipedia.org/wiki/Name_mangling">demangling</a> them. The <code>-B</code> flag of <code>grep</code> specifies the number of preceding lines that should be shown (5 in our case).</p> <p>We clearly see the <code>int3</code> exception that caused the breakpoint exception at address <code>1140a5</code>. Wait… the stored instruction pointer was <code>1140a6</code>, which is a normal <code>add</code> operation. What’s happening here?</p> <h3 id="faults-aborts-and-traps"><a class="zola-anchor" href="#faults-aborts-and-traps" aria-label="Anchor link for: faults-aborts-and-traps">🔗</a>Faults, Aborts, and Traps</h3> <p>The answer is that the stored instruction pointer only points to the causing instruction for <em>fault</em> type exceptions, but not for <em>trap</em> or <em>abort</em> type exceptions. The difference between these types is the following:</p> <ul> <li><strong>Faults</strong> are exceptions that can be corrected so that the program can continue as if nothing happened. An example is the <a href="https://wiki.osdev.org/Exceptions#Page_Fault">page fault</a>, which can often be resolved by loading the accessed page from the disk into memory.</li> <li><strong>Aborts</strong> are fatal exceptions that can’t be recovered. Examples are <a href="https://wiki.osdev.org/Exceptions#Machine_Check">machine check exception</a> or the <a href="https://wiki.osdev.org/Exceptions#Double_Fault">double fault</a>.</li> <li><strong>Traps</strong> are only reported to the kernel, but don’t hinder the continuation of the program. Examples are the breakpoint exception and the <a href="https://wiki.osdev.org/Exceptions#Overflow">overflow exception</a>.</li> </ul> <p>The reason for the diffent instruction pointer values is that the stored value is also the return address. So for faults, the instruction that caused the exception is restarted and might cause the same exception again if it’s not resolved. This would not make much sense for traps, since invoking the breakpoint exception again would just cause another breakpoint exception<sup class="footnote-reference"><a href="#fn-breakpoint-restart-use-cases">1</a></sup>. Thus the instruction pointer points to the <em>next</em> instruction for these exceptions.</p> <p>In some cases, the distinction between faults and traps is vague. For example, the <a href="https://wiki.osdev.org/Exceptions#Debug">debug exception</a> behaves like a fault in some cases, but like a trap in others. So to find out the meaning of the saved instruction pointer, it is a good idea to read the official documentation for the exception, which can be found in the <a href="https://www.amd.com/system/files/TechDocs/24593.pdf">AMD64 manual</a> in Section 8.2. For example, for the breakpoint exception it says:</p> <blockquote> <p><code>#BP</code> is a trap-type exception. The saved instruction pointer points to the byte after the <code>INT3</code> instruction.</p> </blockquote> <p>The documentation of the <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.Idt.html"><code>Idt</code></a> struct and the <a href="https://wiki.osdev.org/Exceptions">OSDev Wiki</a> also contain this information.</p> <h2 id="too-much-magic"><a class="zola-anchor" href="#too-much-magic" aria-label="Anchor link for: too-much-magic">🔗</a>Too much Magic?</h2> <p>The <code>x86-interrupt</code> calling convention and the <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.Idt.html"><code>Idt</code></a> type made the exception handling process relatively straightforward and painless. If this was too much magic for you and you like to learn all the gory details of exception handling, we got you covered: Our <a href="https://os.phil-opp.com/edition-1/extra/naked-exceptions/">“Handling Exceptions with Naked Functions”</a> series shows how to handle exceptions without the <code>x86-interrupt</code> calling convention and also creates its own <code>Idt</code> type. Historically, these posts were the main exception handling posts before the <code>x86-interrupt</code> calling convention and the <code>x86_64</code> crate existed.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>We’ve successfully caught our first exception and returned from it! The next step is to add handlers for other common exceptions such as page faults. We also need to make sure that we never cause a <a href="https://wiki.osdev.org/Triple_Fault">triple fault</a>, since it causes a complete system reset. The next post explains how we can avoid this by correctly catching <a href="https://wiki.osdev.org/Double_Fault#Double_Fault">double faults</a>.</p> <h2 id="footnotes"><a class="zola-anchor" href="#footnotes" aria-label="Anchor link for: footnotes">🔗</a>Footnotes</h2> <div class="footnote-definition" id="fn-breakpoint-restart-use-cases"><sup class="footnote-definition-label">1</sup> <p>There are valid use cases for restarting an instruction that caused a breakpoint. The most common use case is a debugger: When setting a breakpoint on some code line, the debugger overwrites the corresponding instruction with an <code>int3</code> instruction, so that the CPU traps when that line is executed. When the user continues execution, the debugger swaps in the original instruction and continues the program from the replaced instruction.</p> </div> Double Faults Mon, 02 Jan 2017 00:00:00 +0000 https://os.phil-opp.com/double-faults/ https://os.phil-opp.com/double-faults/ <p>In this post we explore double faults in detail. We also set up an <em>Interrupt Stack Table</em> to catch double faults on a separate kernel stack. This way, we can completely prevent triple faults, even on kernel stack overflow.</p> <span id="continue-reading"></span> <p>As always, the complete source code is available on <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_10">GitHub</a>. Please file <a href="https://github.com/phil-opp/blog_os/issues">issues</a> for any problems, questions, or improvement suggestions. There is also a <a href="https://gitter.im/phil-opp/blog_os">gitter chat</a> and a comment section at the end of this page.</p> <h2 id="what-is-a-double-fault"><a class="zola-anchor" href="#what-is-a-double-fault" aria-label="Anchor link for: what-is-a-double-fault">🔗</a>What is a Double Fault?</h2> <p>In simplified terms, a double fault is a special exception that occurs when the CPU fails to invoke an exception handler. For example, it occurs when a page fault is triggered but there is no page fault handler registered in the <a href="https://os.phil-opp.com/handling-exceptions/#the-interrupt-descriptor-table">Interrupt Descriptor Table</a> (IDT). So it’s kind of similar to catch-all blocks in programming languages with exceptions, e.g. <code>catch(...)</code> in C++ or <code>catch(Exception e)</code> in Java or C#.</p> <p>A double fault behaves like a normal exception. It has the vector number <code>8</code> and we can define a normal handler function for it in the IDT. It is really important to provide a double fault handler, because if a double fault is unhandled a fatal <em>triple fault</em> occurs. Triple faults can’t be caught and most hardware reacts with a system reset.</p> <h3 id="triggering-a-double-fault"><a class="zola-anchor" href="#triggering-a-double-fault" aria-label="Anchor link for: triggering-a-double-fault">🔗</a>Triggering a Double Fault</h3> <p>Let’s provoke a double fault by triggering an exception for that we didn’t define a handler function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#608b4e;">// initialize our IDT </span><span> interrupts::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// trigger a page fault </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> *(</span><span style="color:#b5cea8;">0xdeadbeaf </span><span style="color:#569cd6;">as *mut u64</span><span>) = </span><span style="color:#b5cea8;">42</span><span>; </span><span> }; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We try to write to address <code>0xdeadbeaf</code>, but the corresponding page is not present in the page tables. Thus, a page fault occurs. We haven’t registered a page fault handler in our <a href="https://os.phil-opp.com/handling-exceptions/#the-interrupt-descriptor-table">IDT</a>, so a double fault occurs.</p> <p>When we start our kernel now, we see that it enters an endless boot loop:</p> <p><img src="https://os.phil-opp.com/double-faults/boot-loop.gif" alt="boot loop" /></p> <p>The reason for the boot loop is the following:</p> <ol> <li>The CPU tries to write to <code>0xdeadbeaf</code>, which causes a page fault.</li> <li>The CPU looks at the corresponding entry in the IDT and sees that the present bit isn’t set. Thus, it can’t call the page fault handler and a double fault occurs.</li> <li>The CPU looks at the IDT entry of the double fault handler, but this entry is also non-present. Thus, a <em>triple</em> fault occurs.</li> <li>A triple fault is fatal. QEMU reacts to it like most real hardware and issues a system reset.</li> </ol> <p>So in order to prevent this triple fault, we need to either provide a handler function for page faults or a double fault handler. Let’s start with the latter, since we want to avoid triple faults in all cases.</p> <h3 id="a-double-fault-handler"><a class="zola-anchor" href="#a-double-fault-handler" aria-label="Anchor link for: a-double-fault-handler">🔗</a>A Double Fault Handler</h3> <p>A double fault is a normal exception with an error code, so we can use our <code>handler_with_error_code</code> macro to create a wrapper function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> </span><span> idt.breakpoint.set_handler_fn(breakpoint_handler); </span><span> idt.double_fault.set_handler_fn(double_fault_handler); </span><span> </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#608b4e;">// our new double fault handler </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;x86-interrupt&quot; </span><span style="color:#569cd6;">fn </span><span>double_fault_handler( </span><span> stack_frame: </span><span style="color:#569cd6;">&amp;mut</span><span> ExceptionStackFrame, _error_code: </span><span style="color:#569cd6;">u64</span><span>) </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">EXCEPTION: DOUBLE FAULT</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, stack_frame); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>Our handler prints a short error message and dumps the exception stack frame. The error code of the double fault handler is always zero, so there’s no reason to print it.</p> <p>When we start our kernel now, we should see that the double fault handler is invoked:</p> <p><img src="https://os.phil-opp.com/double-faults/qemu-catch-double-fault.png" alt="QEMU printing EXCEPTION: DOUBLE FAULT and the exception stack frame" /></p> <p>It worked! Here is what happens this time:</p> <ol> <li>The CPU executes tries to write to <code>0xdeadbeaf</code>, which causes a page fault.</li> <li>Like before, the CPU looks at the corresponding entry in the IDT and sees that the present bit isn’t set. Thus, a double fault occurs.</li> <li>The CPU jumps to the – now present – double fault handler.</li> </ol> <p>The triple fault (and the boot-loop) no longer occurs, since the CPU can now call the double fault handler.</p> <p>That was quite straightforward! So why do we need a whole post for this topic? Well, we’re now able to catch <em>most</em> double faults, but there are some cases where our current approach doesn’t suffice.</p> <h2 id="causes-of-double-faults"><a class="zola-anchor" href="#causes-of-double-faults" aria-label="Anchor link for: causes-of-double-faults">🔗</a>Causes of Double Faults</h2> <p>Before we look at the special cases, we need to know the exact causes of double faults. Above, we used a pretty vague definition:</p> <blockquote> <p>A double fault is a special exception that occurs when the CPU fails to invoke an exception handler.</p> </blockquote> <p>What does <em>“fails to invoke”</em> mean exactly? The handler is not present? The handler is <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf">swapped out</a>? And what happens if a handler causes exceptions itself?</p> <p>For example, what happens if… :</p> <ol> <li>a divide-by-zero exception occurs, but the corresponding handler function is swapped out?</li> <li>a page fault occurs, but the page fault handler is swapped out?</li> <li>a divide-by-zero handler causes a breakpoint exception, but the breakpoint handler is swapped out?</li> <li>our kernel overflows its stack and the <a href="https://os.phil-opp.com/remap-the-kernel/#creating-a-guard-page">guard page</a> is hit?</li> </ol> <p>Fortunately, the AMD64 manual (<a href="https://www.amd.com/system/files/TechDocs/24593.pdf">PDF</a>) has an exact definition (in Section 8.2.9). According to it, a “double fault exception <em>can</em> occur when a second exception occurs during the handling of a prior (first) exception handler”. The <em>“can”</em> is important: Only very specific combinations of exceptions lead to a double fault. These combinations are:</p> <table><thead><tr><th>First Exception</th><th>Second Exception</th></tr></thead><tbody> <tr><td><a href="https://wiki.osdev.org/Exceptions#Division_Error">Divide-by-zero</a>,<br><a href="https://wiki.osdev.org/Exceptions#Invalid_TSS">Invalid TSS</a>,<br><a href="https://wiki.osdev.org/Exceptions#Segment_Not_Present">Segment Not Present</a>,<br><a href="https://wiki.osdev.org/Exceptions#Stack-Segment_Fault">Stack-Segment Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#General_Protection_Fault">General Protection Fault</a></td><td><a href="https://wiki.osdev.org/Exceptions#Invalid_TSS">Invalid TSS</a>,<br><a href="https://wiki.osdev.org/Exceptions#Segment_Not_Present">Segment Not Present</a>,<br><a href="https://wiki.osdev.org/Exceptions#Stack-Segment_Fault">Stack-Segment Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#General_Protection_Fault">General Protection Fault</a></td></tr> <tr><td><a href="https://wiki.osdev.org/Exceptions#Page_Fault">Page Fault</a></td><td><a href="https://wiki.osdev.org/Exceptions#Page_Fault">Page Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#Invalid_TSS">Invalid TSS</a>,<br><a href="https://wiki.osdev.org/Exceptions#Segment_Not_Present">Segment Not Present</a>,<br><a href="https://wiki.osdev.org/Exceptions#Stack-Segment_Fault">Stack-Segment Fault</a>,<br><a href="https://wiki.osdev.org/Exceptions#General_Protection_Fault">General Protection Fault</a></td></tr> </tbody></table> <p>So for example a divide-by-zero fault followed by a page fault is fine (the page fault handler is invoked), but a divide-by-zero fault followed by a general-protection fault leads to a double fault.</p> <p>With the help of this table, we can answer the first three of the above questions:</p> <ol> <li>If a divide-by-zero exception occurs and the corresponding handler function is swapped out, a <em>page fault</em> occurs and the <em>page fault handler</em> is invoked.</li> <li>If a page fault occurs and the page fault handler is swapped out, a <em>double fault</em> occurs and the <em>double fault handler</em> is invoked.</li> <li>If a divide-by-zero handler causes a breakpoint exception, the CPU tries to invoke the breakpoint handler. If the breakpoint handler is swapped out, a <em>page fault</em> occurs and the <em>page fault handler</em> is invoked.</li> </ol> <p>In fact, even the case of a non-present handler follows this scheme: A non-present handler causes a <em>segment-not-present</em> exception. We didn’t define a segment-not-present handler, so another segment-not-present exception occurs. According to the table, this leads to a double fault.</p> <h3 id="kernel-stack-overflow"><a class="zola-anchor" href="#kernel-stack-overflow" aria-label="Anchor link for: kernel-stack-overflow">🔗</a>Kernel Stack Overflow</h3> <p>Let’s look at the fourth question:</p> <blockquote> <p>What happens if our kernel overflows its stack and the <a href="https://os.phil-opp.com/remap-the-kernel/#creating-a-guard-page">guard page</a> is hit?</p> </blockquote> <p>When our kernel overflows its stack and hits the guard page, a <em>page fault</em> occurs. The CPU looks up the page fault handler in the IDT and tries to push the <a href="https://os.phil-opp.com/handling-exceptions/#the-exception-stack-frame">exception stack frame</a> onto the stack. However, our current stack pointer still points to the non-present guard page. Thus, a second page fault occurs, which causes a double fault (according to the above table).</p> <p>So the CPU tries to call our <em>double fault handler</em> now. However, on a double fault the CPU tries to push the exception stack frame, too. Our stack pointer still points to the guard page, so a <em>third</em> page fault occurs, which causes a <em>triple fault</em> and a system reboot. So our current double fault handler can’t avoid a triple fault in this case.</p> <p>Let’s try it ourselves! We can easily provoke a kernel stack overflow by calling a function that recurses endlessly:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#608b4e;">// initialize our IDT </span><span> interrupts::init(); </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>stack_overflow() { </span><span> stack_overflow(); </span><span style="color:#608b4e;">// for each recursion, the return address is pushed </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// trigger a stack overflow </span><span> stack_overflow(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When we try this code in QEMU, we see that the system enters a boot-loop again.</p> <p>So how can we avoid this problem? We can’t omit the pushing of the exception stack frame, since the CPU itself does it. So we need to ensure somehow that the stack is always valid when a double fault exception occurs. Fortunately, the x86_64 architecture has a solution to this problem.</p> <h2 id="switching-stacks"><a class="zola-anchor" href="#switching-stacks" aria-label="Anchor link for: switching-stacks">🔗</a>Switching Stacks</h2> <p>The x86_64 architecture is able to switch to a predefined, known-good stack when an exception occurs. This switch happens at hardware level, so it can be performed before the CPU pushes the exception stack frame.</p> <p>This switching mechanism is implemented as an <em>Interrupt Stack Table</em> (IST). The IST is a table of 7 pointers to known-good stacks. In Rust-like pseudo code:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">struct </span><span>InterruptStackTable { </span><span> stack_pointers: [Option&lt;StackPointer&gt;; 7], </span><span>} </span></code></pre> <p>For each exception handler, we can choose a stack from the IST through the <code>options</code> field in the corresponding <a href="https://os.phil-opp.com/handling-exceptions/#the-interrupt-descriptor-table">IDT entry</a>. For example, we could use the first stack in the IST for our double fault handler. Then the CPU would automatically switch to this stack whenever a double fault occurs. This switch would happen before anything is pushed, so it would prevent the triple fault.</p> <h3 id="allocating-a-new-stack"><a class="zola-anchor" href="#allocating-a-new-stack" aria-label="Anchor link for: allocating-a-new-stack">🔗</a>Allocating a new Stack</h3> <p>In order to fill an Interrupt Stack Table later, we need a way to allocate new stacks. Therefore we extend our <code>memory</code> module with a new <code>stack_allocator</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>stack_allocator; </span><span> </span></code></pre> <p>First, we create a new <code>StackAllocator</code> struct and a constructor function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/stack_allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>memory::paging::PageIter; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>StackAllocator { </span><span> range: PageIter, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>StackAllocator { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new(page_range: PageIter) -&gt; StackAllocator { </span><span> StackAllocator { range: page_range } </span><span> } </span><span>} </span></code></pre> <p>We create a simple <code>StackAllocator</code> that allocates stacks from a given range of pages (<code>PageIter</code> is an Iterator over a range of pages; we introduced it <a href="https://os.phil-opp.com/kernel-heap/#mapping-the-heap">in the kernel heap post</a>.).</p> <p>We add a <code>alloc_stack</code> method that allocates a new stack:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/stack_allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>memory::paging::{self, Page, ActivePageTable}; </span><span style="color:#569cd6;">use </span><span>memory::{</span><span style="color:#b4cea8;">PAGE_SIZE</span><span>, FrameAllocator}; </span><span> </span><span style="color:#569cd6;">impl </span><span>StackAllocator { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>alloc_stack&lt;FA: FrameAllocator&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> active_table: </span><span style="color:#569cd6;">&amp;mut</span><span> ActivePageTable, </span><span> frame_allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> FA, </span><span> size_in_pages: </span><span style="color:#569cd6;">usize</span><span>) </span><span> -&gt; Option&lt;Stack&gt; { </span><span> </span><span style="color:#569cd6;">if</span><span> size_in_pages == </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> </span><span style="color:#569cd6;">return </span><span>None; </span><span style="color:#608b4e;">/* a zero sized stack makes no sense */ </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// clone the range, since we only want to change it on success </span><span> </span><span style="color:#569cd6;">let mut</span><span> range = self.range.clone(); </span><span> </span><span> </span><span style="color:#608b4e;">// try to allocate the stack pages and a guard page </span><span> </span><span style="color:#569cd6;">let</span><span> guard_page = range.next(); </span><span> </span><span style="color:#569cd6;">let</span><span> stack_start = range.next(); </span><span> </span><span style="color:#569cd6;">let</span><span> stack_end = </span><span style="color:#569cd6;">if</span><span> size_in_pages == </span><span style="color:#b5cea8;">1 </span><span>{ </span><span> stack_start </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> </span><span style="color:#608b4e;">// choose the (size_in_pages-2)th element, since index </span><span> </span><span style="color:#608b4e;">// starts at 0 and we already allocated the start page </span><span> range.nth(size_in_pages - </span><span style="color:#b5cea8;">2</span><span>) </span><span> }; </span><span> </span><span> </span><span style="color:#569cd6;">match </span><span>(guard_page, stack_start, stack_end) { </span><span> (Some(</span><span style="color:#569cd6;">_</span><span>), Some(start), Some(end)) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#608b4e;">// success! write back updated range </span><span> self.range = range; </span><span> </span><span> </span><span style="color:#608b4e;">// map stack pages to physical frames </span><span> </span><span style="color:#569cd6;">for</span><span> page </span><span style="color:#569cd6;">in </span><span>Page::range_inclusive(start, end) { </span><span> active_table.map(page, paging::</span><span style="color:#b4cea8;">WRITABLE</span><span>, frame_allocator); </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// create a new stack </span><span> </span><span style="color:#569cd6;">let</span><span> top_of_stack = end.start_address() + </span><span style="color:#b4cea8;">PAGE_SIZE</span><span>; </span><span> Some(Stack::new(top_of_stack, start.start_address())) </span><span> } </span><span> </span><span style="color:#569cd6;">_ =&gt; </span><span>None, </span><span style="color:#608b4e;">/* not enough pages */ </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The method takes mutable references to the <a href="https://os.phil-opp.com/page-tables/#page-table-ownership">ActivePageTable</a> and a <a href="https://os.phil-opp.com/allocating-frames/#a-frame-allocator">FrameAllocator</a>, since it needs to map the new virtual stack pages to physical frames. We define that the stack size is a multiple of the page size.</p> <p>Instead of operating directly on <code>self.range</code>, we <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#tymethod.clone">clone</a> it and only write it back on success. This way, subsequent stack allocations can still succeed if there are pages left (e.g., a call with <code>size_in_pages = 3</code> can still succeed after a failed call with <code>size_in_pages = 100</code>).</p> <p>In order to be able to clone <code>PageIter</code>, we add a <code>#[derive(Clone)]</code> to its definition in <code>src/memory/paging/mod.rs</code>. We also need to make the <code>start_address</code> method of the <code>Page</code> type public (in the same file).</p> <p>The actual allocation is straightforward: First, we choose the next page as <a href="https://os.phil-opp.com/remap-the-kernel/#creating-a-guard-page">guard page</a>. Then we choose the next <code>size_in_pages</code> pages as stack pages using <a href="https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.nth">Iterator::nth</a>. If all three variables are <code>Some</code>, the allocation succeeded and we map the stack pages to physical frames using <a href="https://os.phil-opp.com/page-tables/#more-mapping-functions">ActivePageTable::map</a>. The guard page remains unmapped.</p> <p>Finally, we create and return a new <code>Stack</code>, which we define as follows:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/stack_allocator.rs </span><span> </span><span>#[derive(Debug)] </span><span style="color:#569cd6;">pub struct </span><span>Stack { </span><span> top: </span><span style="color:#569cd6;">usize</span><span>, </span><span> bottom: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Stack { </span><span> </span><span style="color:#569cd6;">fn </span><span>new(top: </span><span style="color:#569cd6;">usize</span><span>, bottom: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Stack { </span><span> assert!(top &gt; bottom); </span><span> Stack { </span><span> top: top, </span><span> bottom: bottom, </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>top(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> self.top </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>bottom(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> self.bottom </span><span> } </span><span>} </span></code></pre> <p>The <code>Stack</code> struct describes a stack though its top and bottom addresses.</p> <h4 id="the-memory-controller"><a class="zola-anchor" href="#the-memory-controller" aria-label="Anchor link for: the-memory-controller">🔗</a>The Memory Controller</h4> <p>Now we’re able to allocate a new double fault stack. However, we add one more level of abstraction to make things easier. For that we add a new <code>MemoryController</code> type to our <code>memory</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">pub use </span><span>self::stack_allocator::Stack; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>MemoryController { </span><span> active_table: paging::ActivePageTable, </span><span> frame_allocator: AreaFrameAllocator, </span><span> stack_allocator: stack_allocator::StackAllocator, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>MemoryController { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>alloc_stack(</span><span style="color:#569cd6;">&amp;mut </span><span>self, size_in_pages: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;Stack&gt; { </span><span> </span><span style="color:#569cd6;">let &amp;mut</span><span> MemoryController { </span><span style="color:#569cd6;">ref mut</span><span> active_table, </span><span> </span><span style="color:#569cd6;">ref mut</span><span> frame_allocator, </span><span> </span><span style="color:#569cd6;">ref mut</span><span> stack_allocator } = self; </span><span> stack_allocator.alloc_stack(active_table, frame_allocator, </span><span> size_in_pages) </span><span> } </span><span>} </span></code></pre> <p>The <code>MemoryController</code> struct holds the three types that are required for <code>alloc_stack</code> and provides a simpler interface (only one argument). The <code>alloc_stack</code> wrapper just takes the tree types as <code>&amp;mut</code> through <a href="https://doc.rust-lang.org/1.10.0/book/patterns.html#destructuring">destructuring</a> and forwards them to the <code>stack_allocator</code>. The <a href="https://doc.rust-lang.org/1.30.0/book/second-edition/ch18-03-pattern-syntax.html#creating-references-in-patterns-with-ref-and-ref-mut">ref mut</a>-s are needed to take the inner fields by mutable reference. Note that we’re re-exporting the <code>Stack</code> type since it is returned by <code>alloc_stack</code>.</p> <p>The last step is to create a <code>StackAllocator</code> and return a <code>MemoryController</code> from <code>memory::init</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init(boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) -&gt; MemoryController { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> stack_allocator = { </span><span> </span><span style="color:#569cd6;">let</span><span> stack_alloc_start = heap_end_page + </span><span style="color:#b5cea8;">1</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> stack_alloc_end = stack_alloc_start + </span><span style="color:#b5cea8;">100</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> stack_alloc_range = Page::range_inclusive(stack_alloc_start, </span><span> stack_alloc_end); </span><span> stack_allocator::StackAllocator::new(stack_alloc_range) </span><span> }; </span><span> </span><span> MemoryController { </span><span> active_table: active_table, </span><span> frame_allocator: frame_allocator, </span><span> stack_allocator: stack_allocator, </span><span> } </span><span>} </span></code></pre> <p>We create a new <code>StackAllocator</code> with a range of 100 pages starting right after the last heap page.</p> <p>In order to do arithmetic on pages (e.g. calculate the hundredth page after <code>stack_alloc_start</code>), we implement <code>Add&lt;usize&gt;</code> for <code>Page</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span> </span><span style="color:#569cd6;">use </span><span>core::ops::Add; </span><span> </span><span style="color:#569cd6;">impl </span><span>Add&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; </span><span style="color:#569cd6;">for </span><span>Page { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Output </span><span>= Page; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>add(self, rhs: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Page { </span><span> Page { number: self.number + rhs } </span><span> } </span><span>} </span></code></pre> <h4 id="allocating-a-double-fault-stack"><a class="zola-anchor" href="#allocating-a-double-fault-stack" aria-label="Anchor link for: allocating-a-double-fault-stack">🔗</a>Allocating a Double Fault Stack</h4> <p>Now we can allocate a new double fault stack by passing the memory controller to our <code>interrupts::init</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> </span><span style="color:#608b4e;">// set up guard page and map the heap pages </span><span> </span><span style="color:#569cd6;">let mut</span><span> memory_controller = memory::init(boot_info); </span><span style="color:#608b4e;">// new return type </span><span> </span><span> </span><span style="color:#608b4e;">// initialize our IDT </span><span> interrupts::init(</span><span style="color:#569cd6;">&amp;mut</span><span> memory_controller); </span><span style="color:#608b4e;">// new argument </span><span> </span><span> </span><span style="color:#569cd6;">... </span><span>} </span><span> </span><span> </span><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>memory::MemoryController; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init(memory_controller: </span><span style="color:#569cd6;">&amp;mut</span><span> MemoryController) { </span><span> </span><span style="color:#569cd6;">let</span><span> double_fault_stack = memory_controller.alloc_stack(</span><span style="color:#b5cea8;">1</span><span>) </span><span> .expect(</span><span style="color:#d69d85;">&quot;could not allocate double fault stack&quot;</span><span>); </span><span> </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>We allocate a 4096 bytes stack (one page) for our double fault handler. Now we just need some way to tell the CPU that it should use this stack for handling double faults.</p> <h3 id="the-ist-and-tss"><a class="zola-anchor" href="#the-ist-and-tss" aria-label="Anchor link for: the-ist-and-tss">🔗</a>The IST and TSS</h3> <p>The Interrupt Stack Table (IST) is part of an old legacy structure called <em><a href="https://en.wikipedia.org/wiki/Task_state_segment">Task State Segment</a></em> (TSS). The TSS used to hold various information (e.g. processor register state) about a task in 32-bit mode and was for example used for <a href="https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching">hardware context switching</a>. However, hardware context switching is no longer supported in 64-bit mode and the format of the TSS changed completely.</p> <p>On x86_64, the TSS no longer holds any task specific information at all. Instead, it holds two stack tables (the IST is one of them). The only common field between the 32-bit and 64-bit TSS is the pointer to the <a href="https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions">I/O port permissions bitmap</a>.</p> <p>The 64-bit TSS has the following format:</p> <table><thead><tr><th>Field</th><th>Type</th></tr></thead><tbody> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u32</code></td></tr> <tr><td>Privilege Stack Table</td><td><code>[u64; 3]</code></td></tr> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u64</code></td></tr> <tr><td>Interrupt Stack Table</td><td><code>[u64; 7]</code></td></tr> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u64</code></td></tr> <tr><td><span style="opacity: 0.5">(reserved)</span></td><td><code>u16</code></td></tr> <tr><td>I/O Map Base Address</td><td><code>u16</code></td></tr> </tbody></table> <p>The <em>Privilege Stack Table</em> is used by the CPU when the privilege level changes. For example, if an exception occurs while the CPU is in user mode (privilege level 3), the CPU normally switches to kernel mode (privilege level 0) before invoking the exception handler. In that case, the CPU would switch to the 0th stack in the Privilege Stack Table (since 0 is the target privilege level). We don’t have any user mode programs yet, so we ignore this table for now.</p> <h4 id="creating-a-tss"><a class="zola-anchor" href="#creating-a-tss" aria-label="Anchor link for: creating-a-tss">🔗</a>Creating a TSS</h4> <p>Let’s create a new TSS that contains our double fault stack in its interrupt stack table. For that we need a TSS struct. Fortunately, the <code>x86_64</code> crate already contains a <a href="https://docs.rs/x86_64/0.1.1/x86_64/structures/tss/struct.TaskStateSegment.html"><code>TaskStateSegment</code> struct</a> that we can use:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::tss::TaskStateSegment; </span></code></pre> <p>Let’s create a new TSS in our <code>interrupts::init</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::VirtualAddress; </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">0</span><span>; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init(memory_controller: </span><span style="color:#569cd6;">&amp;mut</span><span> MemoryController) { </span><span> </span><span style="color:#569cd6;">let</span><span> double_fault_stack = memory_controller.alloc_stack(</span><span style="color:#b5cea8;">1</span><span>) </span><span> .expect(</span><span style="color:#d69d85;">&quot;could not allocate double fault stack&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> tss = TaskStateSegment::new(); </span><span> tss.interrupt_stack_table[</span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX</span><span>] = VirtualAddress( </span><span> double_fault_stack.top()); </span><span> </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>We define that the 0th IST entry is the double fault stack (any other IST index would work too). We create a new TSS through the <code>TaskStateSegment::new</code> function and load the top address (stacks grow downwards) of the double fault stack into the 0th entry.</p> <h4 id="loading-the-tss"><a class="zola-anchor" href="#loading-the-tss" aria-label="Anchor link for: loading-the-tss">🔗</a>Loading the TSS</h4> <p>Now that we created a new TSS, we need a way to tell the CPU that it should use it. Unfortunately, this is a bit cumbersome, since the TSS is a Task State <em>Segment</em> (for historical reasons). So instead of loading the table directly, we need to add a new segment descriptor to the <a href="https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/">Global Descriptor Table</a> (GDT). Then we can load our TSS invoking the <a href="https://www.felixcloutier.com/x86/ltr"><code>ltr</code> instruction</a> with the respective GDT index.</p> <h3 id="the-global-descriptor-table-again"><a class="zola-anchor" href="#the-global-descriptor-table-again" aria-label="Anchor link for: the-global-descriptor-table-again">🔗</a>The Global Descriptor Table (again)</h3> <p>The Global Descriptor Table (GDT) is a relict that was used for <a href="https://en.wikipedia.org/wiki/X86_memory_segmentation">memory segmentation</a> before paging became the de facto standard. It is still needed in 64-bit mode for various things such as kernel/user mode configuration or TSS loading.</p> <p>We already created a GDT <a href="https://os.phil-opp.com/entering-longmode/#the-global-descriptor-table">when switching to long mode</a>. Back then, we used assembly to create valid code and data segment descriptors, which were required to enter 64-bit mode. We could just edit that assembly file and add an additional TSS descriptor. However, we now have the expressiveness of Rust, so let’s do it in Rust instead.</p> <p>We start by creating a new <code>interrupts::gdt</code> submodule. For that we need to rename the <code>src/interrupts.rs</code> file to <code>src/interrupts/mod.rs</code>. Then we can create a new submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>gdt; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/interrupts/gdt.rs </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Gdt { </span><span> table: [</span><span style="color:#569cd6;">u64</span><span>; 8], </span><span> next_free: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Gdt { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new() -&gt; Gdt { </span><span> Gdt { </span><span> table: [</span><span style="color:#b5cea8;">0</span><span>; </span><span style="color:#b5cea8;">8</span><span>], </span><span> next_free: </span><span style="color:#b5cea8;">1</span><span>, </span><span> } </span><span> } </span><span>} </span></code></pre> <p>We create a simple <code>Gdt</code> struct with two fields. The <code>table</code> field contains the actual GDT modeled as a <code>[u64; 8]</code>. Theoretically, a GDT can have up to 8192 entries, but this doesn’t make much sense in 64-bit mode (since there is no real segmentation support). Eight entries should be more than enough for our system.</p> <p>The <code>next_free</code> field stores the index of the next free entry. We initialize it with <code>1</code> since the 0th entry needs always needs to be 0 in a valid GDT.</p> <h4 id="user-and-system-segments"><a class="zola-anchor" href="#user-and-system-segments" aria-label="Anchor link for: user-and-system-segments">🔗</a>User and System Segments</h4> <p>There are two types of GDT entries in long mode: user and system segment descriptors. Descriptors for code and data segment segments are user segment descriptors. They contain no addresses since segments always span the complete address space on x86_64 (real segmentation is no longer supported). Thus, user segment descriptors only contain a few flags (e.g. present or user mode) and fit into a single <code>u64</code> entry.</p> <p>System descriptors such as TSS descriptors are different. They often contain a base address and a limit (e.g. TSS start and length) and thus need more than 64 bits. Therefore, system segments are 128 bits. They are stored as two consecutive entries in the GDT.</p> <p>Consequently, we model a <code>Descriptor</code> as an <code>enum</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/gdt.rs </span><span> </span><span style="color:#569cd6;">pub enum </span><span>Descriptor { </span><span> UserSegment(</span><span style="color:#569cd6;">u64</span><span>), </span><span> SystemSegment(</span><span style="color:#569cd6;">u64</span><span>, </span><span style="color:#569cd6;">u64</span><span>), </span><span>} </span></code></pre> <p>The flag bits are common between all descriptor types, so we create a general <code>DescriptorFlags</code> type (using the <a href="https://docs.rs/bitflags/0.9.1/bitflags/macro.bitflags.html">bitflags</a> macro):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/gdt.rs </span><span> </span><span>bitflags! { </span><span> </span><span style="color:#569cd6;">struct </span><span>DescriptorFlags: u64 { </span><span> const CONFORMING = 1 &lt;&lt; 42; </span><span> const EXECUTABLE = 1 &lt;&lt; 43; </span><span> const USER_SEGMENT = 1 &lt;&lt; 44; </span><span> const PRESENT = 1 &lt;&lt; 47; </span><span> const LONG_MODE = 1 &lt;&lt; 53; </span><span> } </span><span>} </span></code></pre> <p>We only add flags that are relevant in 64-bit mode. For example, we omit the read/write bit, since it is completely ignored by the CPU in 64-bit mode.</p> <h4 id="code-segments"><a class="zola-anchor" href="#code-segments" aria-label="Anchor link for: code-segments">🔗</a>Code Segments</h4> <p>We add a function to create kernel mode code segments:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/gdt.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Descriptor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>kernel_code_segment() -&gt; Descriptor { </span><span> </span><span style="color:#569cd6;">let</span><span> flags = </span><span style="color:#b4cea8;">USER_SEGMENT </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">EXECUTABLE </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">LONG_MODE</span><span>; </span><span> Descriptor::UserSegment(flags.bits()) </span><span> } </span><span>} </span></code></pre> <p>We set the <code>USER_SEGMENT</code> bit to indicate a 64 bit user segment descriptor (otherwise the CPU expects a 128 bit system segment descriptor). The <code>PRESENT</code>, <code>EXECUTABLE</code>, and <code>LONG_MODE</code> bits are also needed for a 64-bit mode code segment.</p> <p>The data segment registers <code>ds</code>, <code>ss</code>, and <code>es</code> are completely ignored in 64-bit mode, so we don’t need any data segment descriptors in our GDT.</p> <h4 id="tss-segments"><a class="zola-anchor" href="#tss-segments" aria-label="Anchor link for: tss-segments">🔗</a>TSS Segments</h4> <p>A TSS descriptor is a system segment descriptor with the following format:</p> <table><thead><tr><th>Bit(s)</th><th>Name</th><th>Meaning</th></tr></thead><tbody> <tr><td>0-15</td><td><strong>limit 0-15</strong></td><td>the first 2 byte of the TSS’s limit</td></tr> <tr><td>16-39</td><td><strong>base 0-23</strong></td><td>the first 3 byte of the TSS’s base address</td></tr> <tr><td>40-43</td><td><strong>type</strong></td><td>must be <code>0b1001</code> for an available 64-bit TSS</td></tr> <tr><td>44</td><td>zero</td><td>must be 0</td></tr> <tr><td>45-46</td><td>privilege</td><td>the <a href="https://wiki.osdev.org/Security#Rings">ring level</a>: 0 for kernel, 3 for user</td></tr> <tr><td>47</td><td><strong>present</strong></td><td>must be 1 for valid selectors</td></tr> <tr><td>48-51</td><td>limit 16-19</td><td>bits 16 to 19 of the segment’s limit</td></tr> <tr><td>52</td><td>available</td><td>freely available to the OS</td></tr> <tr><td>53-54</td><td>ignored</td><td></td></tr> <tr><td>55</td><td>granularity</td><td>if it’s set, the limit is the number of pages, else it’s a byte number</td></tr> <tr><td>56-63</td><td><strong>base 24-31</strong></td><td>the fourth byte of the base address</td></tr> <tr><td>64-95</td><td><strong>base 32-63</strong></td><td>the last four bytes of the base address</td></tr> <tr><td>96-127</td><td>ignored/must be zero</td><td>bits 104-108 must be zero, the rest is ignored</td></tr> </tbody></table> <p>We only need the bold fields for our TSS descriptor. For example, we don’t need the <code>limit 16-19</code> field since a TSS has a fixed size that is smaller than <code>2^16</code>.</p> <p>Let’s add a function to our descriptor that creates a TSS descriptor for a given TSS:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/gdt.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::tss::TaskStateSegment; </span><span> </span><span style="color:#569cd6;">impl </span><span>Descriptor { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>tss_segment(tss: </span><span style="color:#569cd6;">&amp;&#39;static</span><span> TaskStateSegment) -&gt; Descriptor { </span><span> </span><span style="color:#569cd6;">use </span><span>core::mem::size_of; </span><span> </span><span style="color:#569cd6;">use </span><span>bit_field::BitField; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> ptr = tss </span><span style="color:#569cd6;">as *const _ as u64</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> low = </span><span style="color:#b4cea8;">PRESENT</span><span>.bits(); </span><span> </span><span style="color:#608b4e;">// base </span><span> low.set_bits(</span><span style="color:#b5cea8;">16</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">40</span><span>, ptr.get_bits(</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">24</span><span>)); </span><span> low.set_bits(</span><span style="color:#b5cea8;">56</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">64</span><span>, ptr.get_bits(</span><span style="color:#b5cea8;">24</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">32</span><span>)); </span><span> </span><span style="color:#608b4e;">// limit (the `-1` in needed since the bound is inclusive) </span><span> low.set_bits(</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">16</span><span>, (size_of::&lt;TaskStateSegment&gt;() - </span><span style="color:#b5cea8;">1</span><span>) </span><span style="color:#569cd6;">as u64</span><span>); </span><span> </span><span style="color:#608b4e;">// type (0b1001 = available 64-bit tss) </span><span> low.set_bits(</span><span style="color:#b5cea8;">40</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">44</span><span>, </span><span style="color:#b5cea8;">0b1001</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> high = </span><span style="color:#b5cea8;">0</span><span>; </span><span> high.set_bits(</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">32</span><span>, ptr.get_bits(</span><span style="color:#b5cea8;">32</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">64</span><span>)); </span><span> </span><span> Descriptor::SystemSegment(low, high) </span><span> } </span><span>} </span></code></pre> <p>The <code>set_bits</code> and <code>get_bits</code> methods are provided by the <a href="https://docs.rs/bit_field/0.6.0/bit_field/trait.BitField.html#method.get_bit"><code>BitField</code> trait</a> of the <code>bit_fields</code> crate. They allow us to easily get or set specific bits in an integer without using bit masks or shift operations. For example, we can do <code>x.set_bits(8..12, 42)</code> instead of <code>x = (x &amp; 0xfffff0ff) | (42 &lt;&lt; 8)</code>.</p> <p>To link the <code>bit_fields</code> crate, we modify our <code>Cargo.toml</code> and our <code>src/lib.rs</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">bit_field </span><span>= </span><span style="color:#d69d85;">&quot;0.7.0&quot; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">extern crate</span><span> bit_field; </span></code></pre> <p>We require the <code>'static</code> lifetime for the <code>TaskStateSegment</code> reference, since the hardware might access it on every interrupt as long as the OS runs.</p> <h4 id="adding-descriptors-to-the-gdt"><a class="zola-anchor" href="#adding-descriptors-to-the-gdt" aria-label="Anchor link for: adding-descriptors-to-the-gdt">🔗</a>Adding Descriptors to the GDT</h4> <p>In order to add descriptors to the GDT, we add a <code>add_entry</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/gdt.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::gdt::SegmentSelector; </span><span style="color:#569cd6;">use </span><span>x86_64::PrivilegeLevel; </span><span> </span><span style="color:#569cd6;">impl </span><span>Gdt { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>add_entry(</span><span style="color:#569cd6;">&amp;mut </span><span>self, entry: Descriptor) -&gt; SegmentSelector { </span><span> </span><span style="color:#569cd6;">let</span><span> index = </span><span style="color:#569cd6;">match</span><span> entry { </span><span> Descriptor::UserSegment(value) </span><span style="color:#569cd6;">=&gt; </span><span>self.push(value), </span><span> Descriptor::SystemSegment(value_low, value_high) </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> index = self.push(value_low); </span><span> self.push(value_high); </span><span> index </span><span> } </span><span> }; </span><span> SegmentSelector::new(index </span><span style="color:#569cd6;">as u16</span><span>, PrivilegeLevel::Ring0) </span><span> } </span><span>} </span></code></pre> <p>For an user segment we just push the <code>u64</code> and remember the index. For a system segment, we push the low and high <code>u64</code> and use the index of the low value. We then use this index to return a new <a href="https://docs.rs/x86/0.8.0/x86/shared/segmentation/struct.SegmentSelector.html#method.new">SegmentSelector</a>.</p> <p>The <code>push</code> method looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/gdt.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Gdt { </span><span> </span><span style="color:#569cd6;">fn </span><span>push(</span><span style="color:#569cd6;">&amp;mut </span><span>self, value: </span><span style="color:#569cd6;">u64</span><span>) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> </span><span style="color:#569cd6;">if </span><span>self.next_free &lt; self.table.len() { </span><span> </span><span style="color:#569cd6;">let</span><span> index = self.next_free; </span><span> self.table[index] = value; </span><span> self.next_free += </span><span style="color:#b5cea8;">1</span><span>; </span><span> index </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> panic!(</span><span style="color:#d69d85;">&quot;GDT full&quot;</span><span>); </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The method just writes to the <code>next_free</code> entry and returns the corresponding index. If there is no free entry left, we panic since this likely indicates a programming error (we should never need to create more than two or three GDT entries for our kernel).</p> <h4 id="loading-the-gdt"><a class="zola-anchor" href="#loading-the-gdt" aria-label="Anchor link for: loading-the-gdt">🔗</a>Loading the GDT</h4> <p>To load the GDT, we add a new <code>load</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/gdt.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Gdt { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>load(</span><span style="color:#569cd6;">&amp;&#39;static </span><span>self) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::tables::{DescriptorTablePointer, lgdt}; </span><span> </span><span style="color:#569cd6;">use </span><span>core::mem::size_of; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> ptr = DescriptorTablePointer { </span><span> base: self.table.as_ptr() </span><span style="color:#569cd6;">as u64</span><span>, </span><span> limit: (self.table.len() * size_of::&lt;</span><span style="color:#569cd6;">u64</span><span>&gt;() - </span><span style="color:#b5cea8;">1</span><span>) </span><span style="color:#569cd6;">as u16</span><span>, </span><span> }; </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ lgdt(</span><span style="color:#569cd6;">&amp;</span><span>ptr) }; </span><span> } </span><span>} </span></code></pre> <p>We use the <a href="https://docs.rs/x86_64/0.1.1/x86_64/instructions/tables/struct.DescriptorTablePointer.html"><code>DescriptorTablePointer</code> struct</a> and the <a href="https://docs.rs/x86_64/0.1.1/x86_64/instructions/tables/fn.lgdt.html"><code>lgdt</code> function</a> provided by the <code>x86_64</code> crate to load our GDT. Again, we require a <code>'static</code> reference since the GDT possibly needs to live for the rest of the run time.</p> <h3 id="putting-it-together"><a class="zola-anchor" href="#putting-it-together" aria-label="Anchor link for: putting-it-together">🔗</a>Putting it together</h3> <p>We now have a double fault stack and are able to create and load a TSS (which contains an IST). So let’s put everything together to catch kernel stack overflows.</p> <p>We already created a new TSS in our <code>interrupts::init</code> function. Now we can load this TSS by creating a new GDT:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init(memory_controller: </span><span style="color:#569cd6;">&amp;mut</span><span> MemoryController) { </span><span> </span><span style="color:#569cd6;">let</span><span> double_fault_stack = memory_controller.alloc_stack(</span><span style="color:#b5cea8;">1</span><span>) </span><span> .expect(</span><span style="color:#d69d85;">&quot;could not allocate double fault stack&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> tss = TaskStateSegment::new(); </span><span> tss.interrupt_stack_table[</span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX</span><span>] = VirtualAddress( </span><span> double_fault_stack.top()); </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> gdt = gdt::Gdt::new(); </span><span> </span><span style="color:#569cd6;">let</span><span> code_selector = gdt.add_entry(gdt::Descriptor::kernel_code_segment()); </span><span> </span><span style="color:#569cd6;">let</span><span> tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(</span><span style="color:#569cd6;">&amp;</span><span>tss)); </span><span> gdt.load(); </span><span> </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>However, when we try to compile it, the following errors occur:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: `tss` does not live long enough </span><span> --&gt; src/interrupts/mod.rs:118:68 </span><span> | </span><span>118 | let tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(&amp;tss)); </span><span> | does not live long enough ^^^ </span><span>... </span><span>122 | } </span><span> | - borrowed value only lives until here </span><span> | </span><span> = note: borrowed value must be valid for the static lifetime... </span><span> </span><span>error: `gdt` does not live long enough </span><span> --&gt; src/interrupts/mod.rs:119:5 </span><span> | </span><span>119 | gdt.load(); </span><span> | ^^^ does not live long enough </span><span>... </span><span>122 | } </span><span> | - borrowed value only lives until here </span><span> | </span><span> = note: borrowed value must be valid for the static lifetime... </span></code></pre> <p>The problem is that we require that the TSS and GDT are valid for the rest of the run time (i.e. for the <code>'static</code> lifetime). But our created <code>tss</code> and <code>gdt</code> live on the stack and are thus destroyed at the end of the <code>init</code> function. So how do we fix this problem?</p> <p>We could allocate our TSS and GDT on the heap using <code>Box</code> and use <a href="https://doc.rust-lang.org/std/boxed/struct.Box.html#method.into_raw">into_raw</a> and a bit of <code>unsafe</code> to convert it to <code>&amp;'static</code> references (<a href="https://github.com/rust-lang/rfcs/pull/1233">RFC 1233</a> was closed unfortunately).</p> <p>Alternatively, we could store them in a <code>static</code> somehow. The <a href="https://docs.rs/lazy_static/0.2.2/lazy_static/"><code>lazy_static</code> macro</a> doesn’t work here, since we need access to the <code>MemoryController</code> for initialization. However, we can use its fundamental building block, the <a href="https://docs.rs/spin/0.4.5/spin/struct.Once.html"><code>spin::Once</code> type</a>.</p> <h4 id="spin-once"><a class="zola-anchor" href="#spin-once" aria-label="Anchor link for: spin-once">🔗</a>spin::Once</h4> <p>Let’s try to solve our problem using <a href="https://docs.rs/spin/0.4.5/spin/struct.Once.html"><code>spin::Once</code></a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">use </span><span>spin::Once; </span><span> </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">TSS</span><span>: Once&lt;TaskStateSegment&gt; = Once::new(); </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">GDT</span><span>: Once&lt;gdt::Gdt&gt; = Once::new(); </span></code></pre> <p>The <code>Once</code> type allows us to initialize a <code>static</code> at runtime. It is safe because the only way to access the static value is through the provided methods (<a href="https://docs.rs/spin/0.4.5/spin/struct.Once.html#method.call_once">call_once</a>, <a href="https://docs.rs/spin/0.4.5/spin/struct.Once.html#method.try">try</a>, and <a href="https://docs.rs/spin/0.4.5/spin/struct.Once.html#method.wait">wait</a>). Thus, no value can be read before initialization and the value can only be initialized once.</p> <p>(The <code>Once</code> was added in spin 0.4, so you’re probably need to update your spin dependency.)</p> <p>So let’s rewrite our <code>interrupts::init</code> function to use the static <code>TSS</code> and <code>GDT</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>init(memory_controller: </span><span style="color:#569cd6;">&amp;mut</span><span> MemoryController) { </span><span> </span><span style="color:#569cd6;">let</span><span> double_fault_stack = memory_controller.alloc_stack(</span><span style="color:#b5cea8;">1</span><span>) </span><span> .expect(</span><span style="color:#d69d85;">&quot;could not allocate double fault stack&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> tss = </span><span style="color:#b4cea8;">TSS</span><span>.call_once(|| { </span><span> </span><span style="color:#569cd6;">let mut</span><span> tss = TaskStateSegment::new(); </span><span> tss.interrupt_stack_table[</span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX</span><span>] = VirtualAddress( </span><span> double_fault_stack.top()); </span><span> tss </span><span> }); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> gdt = </span><span style="color:#b4cea8;">GDT</span><span>.call_once(|| { </span><span> </span><span style="color:#569cd6;">let mut</span><span> gdt = gdt::Gdt::new(); </span><span> </span><span style="color:#569cd6;">let</span><span> code_selector = gdt.add_entry(gdt::Descriptor:: </span><span> kernel_code_segment()); </span><span> </span><span style="color:#569cd6;">let</span><span> tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(</span><span style="color:#569cd6;">&amp;</span><span>tss)); </span><span> gdt </span><span> }); </span><span> gdt.load(); </span><span> </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>Now it should compile again!</p> <h4 id="the-final-steps"><a class="zola-anchor" href="#the-final-steps" aria-label="Anchor link for: the-final-steps">🔗</a>The final Steps</h4> <p>We’re almost done. We successfully loaded our new GDT, which contains a TSS descriptor. Now there are just a few steps left:</p> <ol> <li>We changed our GDT, so we should reload the <code>cs</code>, the code segment register. This required since the old segment selector could point a different GDT descriptor now (e.g. a TSS descriptor).</li> <li>We loaded a GDT that contains a TSS selector, but we still need to tell the CPU that it should use that TSS.</li> <li>As soon as our TSS is loaded, the CPU has access to a valid interrupt stack table (IST). Then we can tell the CPU that it should use our new double fault stack by modifying our double fault IDT entry.</li> </ol> <p>For the first two steps, we need access to the <code>code_selector</code> and <code>tss_selector</code> variables outside of the closure. We can achieve this by moving the <code>let</code> declarations out of the closure:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span style="color:#569cd6;">pub fn </span><span>init(memory_controller: </span><span style="color:#569cd6;">&amp;mut</span><span> MemoryController) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::structures::gdt::SegmentSelector; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::segmentation::set_cs; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::tables::load_tss; </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> code_selector = SegmentSelector(</span><span style="color:#b5cea8;">0</span><span>); </span><span> </span><span style="color:#569cd6;">let mut</span><span> tss_selector = SegmentSelector(</span><span style="color:#b5cea8;">0</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> gdt = </span><span style="color:#b4cea8;">GDT</span><span>.call_once(|| { </span><span> </span><span style="color:#569cd6;">let mut</span><span> gdt = gdt::Gdt::new(); </span><span> code_selector = gdt.add_entry(gdt::Descriptor::kernel_code_segment()); </span><span> tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(</span><span style="color:#569cd6;">&amp;</span><span>tss)); </span><span> gdt </span><span> }); </span><span> gdt.load(); </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#608b4e;">// reload code segment register </span><span> set_cs(code_selector); </span><span> </span><span style="color:#608b4e;">// load TSS </span><span> load_tss(tss_selector); </span><span> } </span><span> </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>We first set the descriptors to <code>empty</code> and then update them from inside the closure (which implicitly borrows them as <code>&amp;mut</code>). Now we’re able to reload the code segment register using <a href="https://docs.rs/x86_64/0.1.2/x86_64/instructions/segmentation/fn.set_cs.html"><code>set_cs</code></a> and to load the TSS using <a href="https://docs.rs/x86_64/0.1.2/x86_64/instructions/tables/fn.load_tss.html"><code>load_tss</code></a>.</p> <p>Now that we loaded a valid TSS and interrupt stack table, we can set the stack index for our double fault handler in the IDT:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupt/mod.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> idt.double_fault.set_handler_fn(double_fault_handler) </span><span> .set_stack_index(</span><span style="color:#b4cea8;">DOUBLE_FAULT_IST_INDEX </span><span style="color:#569cd6;">as u16</span><span>); </span><span> } </span><span> </span><span style="color:#569cd6;">... </span><span> }; </span><span>} </span></code></pre> <p>The <code>set_stack_index</code> method is unsafe because the the caller must ensure that the used index is valid and not already used for another exception.</p> <p>That’s it! Now the CPU should switch to the double fault stack whenever a double fault occurs. Thus, we are able to catch <em>all</em> double faults, including kernel stack overflows:</p> <p><img src="https://os.phil-opp.com/double-faults/qemu-double-fault-on-stack-overflow.png" alt="QEMU printing EXCEPTION: DOUBLE FAULT and a dump of the exception stack frame" /></p> <p>From now on we should never see a triple fault again!</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>Now that we mastered exceptions, it’s time to explore another kind of interrupts: interrupts from external devices such as timers, keyboards, or network controllers. These hardware interrupts are very similar to exceptions, e.g. they are also dispatched through the IDT.</p> <p>However, unlike exceptions, they don’t arise directly on the CPU. Instead, an <em>interrupt controller</em> aggregates these interrupts and forwards them to CPU depending on their priority. In the next posts we will explore the two interrupt controller variants on x86: the <a href="https://en.wikipedia.org/wiki/Intel_8259">Intel 8259</a> (“PIC”) and the <a href="https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller">APIC</a>. This will allow us to react to keyboard and mouse input.</p> Returning from Exceptions Wed, 21 Sep 2016 00:00:00 +0000 https://os.phil-opp.com/returning-from-exceptions/ https://os.phil-opp.com/returning-from-exceptions/ <p>In this post, we learn how to return from exceptions correctly. In the course of this, we will explore the <code>iretq</code> instruction, the C calling convention, multimedia registers, and the red zone.</p> <span id="continue-reading"></span> <p>As always, the complete source code is on <a href="https://github.com/phil-opp/blog_os/tree/returning_from_exceptions">GitHub</a>. Please file <a href="https://github.com/phil-opp/blog_os/issues">issues</a> for any problems, questions, or improvement suggestions. There is also a <a href="https://gitter.im/phil-opp/blog_os">gitter chat</a> and a comment section at the end of this page.</p> <blockquote> <p><strong>Note</strong>: This post describes how to handle exceptions using naked functions (see <a href="https://os.phil-opp.com/edition-1/extra/naked-exceptions/">“Handling Exceptions with Naked Functions”</a> for an overview). Our new way of handling exceptions can be found in the <a href="https://os.phil-opp.com/handling-exceptions/">“Handling Exceptions”</a> post.</p> </blockquote> <h2 id="introduction"><a class="zola-anchor" href="#introduction" aria-label="Anchor link for: introduction">🔗</a>Introduction</h2> <p>Most exceptions are fatal and can’t be resolved. For example, we can’t return from a divide-by-zero exception in a reasonable way. However, there are some exceptions that we can resolve:</p> <p>Imagine a system that uses <a href="https://en.wikipedia.org/wiki/Memory-mapped_file">memory mapped files</a>: We map a file into the virtual address space without loading it into memory. Whenever we access a part of the file for the first time, a page fault occurs. However, this page fault is not fatal. We can resolve it by loading the corresponding page from disk into memory and setting the <code>present</code> flag in the page table. Then we can return from the page fault handler and restart the failed instruction, which now successfully accesses the file data.</p> <p>Memory mapped files are completely out of scope for us right now (we have neither a file concept nor a hard disk driver). So we need an exception that we can resolve easily so that we can return from it in a reasonable way. Fortunately, there is an exception that needs no resolution at all: the breakpoint exception.</p> <h2 id="the-breakpoint-exception"><a class="zola-anchor" href="#the-breakpoint-exception" aria-label="Anchor link for: the-breakpoint-exception">🔗</a>The Breakpoint Exception</h2> <p>The breakpoint exception is the perfect exception to test our upcoming return-from-exception logic. Its only purpose is to temporary pause a program when the breakpoint instruction <code>int3</code> is executed.</p> <p>The breakpoint exception is commonly used in debuggers: When the user sets a breakpoint, the debugger overwrites the corresponding instruction with the <code>int3</code> instruction so that the CPU throws the breakpoint exception when it reaches that line. When the user wants to continue the program, the debugger replaces the <code>int3</code> instruction with the original instruction again and continues the program. For more details, see the <a href="https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints">How debuggers work</a> series.</p> <p>For our use case, we don’t need to overwrite any instructions (it wouldn’t even be possible since we <a href="https://os.phil-opp.com/remap-the-kernel/#using-the-correct-flags">set the page table flags</a> to read-only). Instead, we just want to print a message when the breakpoint instruction is executed and then continue the program.</p> <h3 id="catching-breakpoints"><a class="zola-anchor" href="#catching-breakpoints" aria-label="Anchor link for: catching-breakpoints">🔗</a>Catching Breakpoints</h3> <p>Let’s start by defining a handler function for the breakpoint exception:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>breakpoint_handler(stack_frame: </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> stack_frame = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*stack_frame }; </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">EXCEPTION: BREAKPOINT at </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> stack_frame.instruction_pointer, stack_frame); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We print an error message and also output the instruction pointer and the rest of the stack frame. Note that this function does <em>not</em> return yet, since our <code>handler!</code> macro still requires a diverging function.</p> <p>We need to register our new handler function in the interrupt descriptor table (IDT):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> </span><span> idt.set_handler(</span><span style="color:#b5cea8;">0</span><span>, handler!(divide_by_zero_handler)); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">3</span><span>, handler!(breakpoint_handler)); </span><span style="color:#608b4e;">// new </span><span> idt.set_handler(</span><span style="color:#b5cea8;">6</span><span>, handler!(invalid_opcode_handler)); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">14</span><span>, handler_with_error_code!(page_fault_handler)); </span><span> </span><span> idt </span><span> }; </span><span>} </span></code></pre> <p>We set the IDT entry with number 3 since it’s the vector number of the breakpoint exception.</p> <h4 id="testing-it"><a class="zola-anchor" href="#testing-it" aria-label="Anchor link for: testing-it">🔗</a>Testing it</h4> <p>In order to test it, we insert an <code>int3</code> instruction in our <code>rust_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">... </span><span>#[macro_use] </span><span style="color:#608b4e;">// needed for the `int!` macro </span><span style="color:#569cd6;">extern crate</span><span> x86_64; </span><span style="color:#569cd6;">... </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(...) { </span><span> </span><span style="color:#569cd6;">... </span><span> interrupts::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// trigger a breakpoint exception </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ int!(</span><span style="color:#b5cea8;">3</span><span>) }; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When we execute <code>make run</code>, we see the following:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/qemu-breakpoint-handler.png" alt="QEMU showing EXCEPTION: BREAKPOINT at 0x110970 and a dump of the exception stack frame" /></p> <p>It works! Now we “just” need to return from the breakpoint handler somehow so that we see the <code>It did not crash</code> message again.</p> <h2 id="returning-from-exceptions"><a class="zola-anchor" href="#returning-from-exceptions" aria-label="Anchor link for: returning-from-exceptions">🔗</a>Returning from Exceptions</h2> <p>So how do we return from exceptions? To make it easier, we look at a normal function return first:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/function-stack-frame.svg" alt="function stack frame" /></p> <p>When calling a function, the <code>call</code> instruction pushes the return address on the stack. When the called function is finished, it can return to the parent function through the <code>ret</code> instruction, which pops the return address from the stack and then jumps to it.</p> <p>The exception stack frame, in contrast, looks a bit different:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/exception-stack-frame.svg" alt="exception stack frame" /></p> <p>Instead of pushing a return address, the CPU pushes the stack and instruction pointers (with their segment descriptors), the RFLAGS register, and an optional error code. It also aligns the stack pointer to a 16 byte boundary before pushing values.</p> <p>So we can’t use a normal <code>ret</code> instruction, since it expects a different stack frame layout. Instead, there is a special instruction for returning from exceptions: <code>iretq</code>.</p> <h3 id="the-iretq-instruction"><a class="zola-anchor" href="#the-iretq-instruction" aria-label="Anchor link for: the-iretq-instruction">🔗</a>The <code>iretq</code> Instruction</h3> <p>The <code>iretq</code> instruction is the one and only way to return from exceptions and is specifically designed for this purpose. The AMD64 instruction manual (<a href="https://www.amd.com/system/files/TechDocs/24594.pdf">PDF</a>) even demands that <code>iretq</code> “<em>must</em> be used to terminate the exception or interrupt handler associated with the exception”.</p> <p>IRETQ restores <code>rip</code>, <code>cs</code>, <code>rflags</code>, <code>rsp</code>, and <code>ss</code> from the values saved on the stack and thus continues the interrupted program. The instruction does not handle the optional error code, so it must be popped from the stack before.</p> <p>We see that <code>iretq</code> treats the stored instruction pointer as return address. For most exceptions, the stored <code>rip</code> points to the instruction that caused the fault. So by executing <code>iretq</code>, we restart the failing instruction. This makes sense because we should have resolved the exception when returning from it, so the instruction should no longer fail (e.g. the accessed part of the memory mapped file is now present in memory).</p> <p>The situation is a bit different for the breakpoint exception, since it needs no resolution. Restarting the <code>int3</code> instruction wouldn’t make sense, since it would cause a new breakpoint exception and we would enter an endless loop. For this reason the hardware designers decided that the stored <code>rip</code> should point to the next instruction after the <code>int3</code> instruction.</p> <p>Let’s check this for our breakpoint handler. Remember, the handler printed the following message (see the image above):</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>EXCEPTION: BREAKPOINT at 0x110970 </span></code></pre> <p>So let’s disassemble the instruction at <code>0x110970</code> and its predecessor:</p> <pre data-lang="bash" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-bash "><code class="language-bash" data-lang="bash"><span>&gt; objdump -d build/kernel-x86_64.bin </span><span style="color:#569cd6;">| </span><span>grep -B1 </span><span style="color:#d69d85;">&quot;110970:&quot; </span><span>11096f: cc int3 </span><span>110970: 48 c7 01 2a 00 00 00 movq $0x2a,(%rcx) </span></code></pre> <p>We see that <code>0x110970</code> indeed points to the next instruction after <code>int3</code>. So we can simply jump to the stored instruction pointer when we want to return from the breakpoint exception.</p> <h3 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h3> <p>Let’s update our <code>handler!</code> macro to support non-diverging exception handlers:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! handler { </span><span> ($name: ident) </span><span style="color:#569cd6;">=&gt; </span><span>{{ </span><span> #[naked] </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rdi, rsp </span><span style="color:#d69d85;"> sub rsp, 8 // align the stack pointer </span><span style="color:#d69d85;"> call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>($name </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame)) </span><span style="color:#608b4e;">// no longer diverging </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> </span><span> </span><span style="color:#608b4e;">// new </span><span> asm!(</span><span style="color:#d69d85;">&quot;add rsp, 8 // undo stack pointer alignment </span><span style="color:#d69d85;"> iretq&quot; </span><span> :::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span> } </span><span> wrapper </span><span> }} </span><span>} </span></code></pre> <p>When an exception handler returns from the <code>call</code> instruction, we use the <code>iretq</code> instruction to continue the interrupted program. Note that we need to undo the stack pointer alignment before, so that <code>rsp</code> points to the end of the exception stack frame again.</p> <p>We’ve changed the handler function type, so we need to adjust our existing exception handlers:</p> <pre data-lang="diff" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-diff "><code class="language-diff" data-lang="diff"><span>// in src/interrupts/mod.rs </span><span> </span><span>extern &quot;C&quot; fn divide_by_zero_handler( </span><span>- stack_frame: &amp;ExceptionStackFrame) -&gt; ! {...} </span><span>+ stack_frame: &amp;ExceptionStackFrame) {...} </span><span> </span><span>extern &quot;C&quot; fn invalid_opcode_handler( </span><span>- stack_frame: &amp;ExceptionStackFrame) -&gt; ! {...} </span><span>+ stack_frame: &amp;ExceptionStackFrame) {...} </span><span> </span><span>extern &quot;C&quot; fn breakpoint_handler( </span><span>- stack_frame: &amp;ExceptionStackFrame) -&gt; ! { </span><span>+ stack_frame: &amp;ExceptionStackFrame) { </span><span> println!(...); </span><span>- loop {} </span><span>} </span></code></pre> <p>Note that we also removed the <code>loop {}</code> at the end of our <code>breakpoint_handler</code> so that it no longer diverges. The <code>divide_by_zero_handler</code> and the <code>invalid_opcode_handler</code> still diverge (albeit the new function type would allow a return).</p> <h3 id="testing"><a class="zola-anchor" href="#testing" aria-label="Anchor link for: testing">🔗</a>Testing</h3> <p>Let’s try our new <code>iretq</code> logic:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/qemu-breakpoint-return-page-fault.png" alt="QEMU output with EXCEPTION BREAKPOINT and EXCEPTION PAGE FAULT but no It did not crash" /></p> <p>Instead of the expected <em>“It did not crash”</em> message after the breakpoint exception, we get a page fault. The strange thing is that our kernel tried to access address <code>0x1</code>, which should never happen. So it seems like we messed up something important.</p> <h3 id="debugging"><a class="zola-anchor" href="#debugging" aria-label="Anchor link for: debugging">🔗</a>Debugging</h3> <p>Let’s debug it using GDB. For that we execute <code>make debug</code> in one terminal (which starts QEMU with the <code>-s -S</code> flags) and then <code>make gdb</code> (which starts and connects GDB) in a second terminal. For more information about GDB debugging, check out our <a href="https://os.phil-opp.com/set-up-gdb/">Set Up GDB</a> guide.</p> <p>First we want to check if our <code>iretq</code> was successful. Therefore we set a breakpoint on the <code>println!("It did not crash line!")</code> statement in <code>src/lib.rs</code>. Let’s assume that it’s on line 61:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>(gdb) break blog_os/src/lib.rs:61 </span><span>Breakpoint 1 at 0x110a95: file /home/.../blog_os/src/lib.rs, line 61. </span></code></pre> <p>This line is after the <code>int3</code> instruction, so we know that the <code>iretq</code> succeeded when the breakpoint is hit. To test this, we continue the execution:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>(gdb) continue </span><span>Continuing. </span><span> </span><span>Breakpoint 1, blog_os::rust_main (multiboot_information_address=1539136) </span><span> at /home/.../blog_os/src/lib.rs:61 </span><span>61 println!(&quot;It did not crash!&quot;); </span><span> </span></code></pre> <p>It worked! So our kernel successfully returned from the <code>int3</code> instruction, which means that the <code>iretq</code> itself works.</p> <p>However, when we <code>continue</code> the execution again, we get the page fault. So the exception occurs somewhere in the <code>println</code> logic. This means that it occurs in code generated by the compiler (and not e.g. in inline assembly). But the compiler should never access <code>0x1</code>, so how is this happening?</p> <p>The answer is that we’ve used the wrong <em>calling convention</em> for our exception handlers. Thus, we violate some compiler invariants so that the code that works fine without intermediate exceptions starts to violate memory safety when it’s executed after a breakpoint exception.</p> <h2 id="calling-conventions"><a class="zola-anchor" href="#calling-conventions" aria-label="Anchor link for: calling-conventions">🔗</a>Calling Conventions</h2> <p>Exceptions are quite similar to function calls: The CPU jumps to the first instruction of the (handler) function and executes the function. Afterwards, if the function is not diverging, the CPU jumps to the return address and continues the execution of the parent function.</p> <p>However, there is a major difference between exceptions and function calls: A function call is invoked voluntary by a compiler inserted <code>call</code> instruction, while an exception might occur at <em>any</em> instruction. In order to understand the consequences of this difference, we need to examine function calls in more detail.</p> <p><a href="https://en.wikipedia.org/wiki/Calling_convention">Calling conventions</a> specify the details of a function call. For example, they specify where function parameters are placed (e.g. in registers or on the stack) and how results are returned. On x86_64 Linux, the following rules apply for C functions (specified in the <a href="https://refspecs.linuxbase.org/elf/gabi41.pdf">System V ABI</a>):</p> <ul> <li>the first six integer arguments are passed in registers <code>rdi</code>, <code>rsi</code>, <code>rdx</code>, <code>rcx</code>, <code>r8</code>, <code>r9</code></li> <li>additional arguments are passed on the stack</li> <li>results are returned in <code>rax</code> and <code>rdx</code></li> </ul> <p>Note that Rust does not follow the C ABI (in fact, <a href="https://github.com/rust-lang/rfcs/issues/600">there isn’t even a Rust ABI yet</a>). So these rules apply only to functions declared as <code>extern "C" fn</code>.</p> <h3 id="preserved-and-scratch-registers"><a class="zola-anchor" href="#preserved-and-scratch-registers" aria-label="Anchor link for: preserved-and-scratch-registers">🔗</a>Preserved and Scratch Registers</h3> <p>The calling convention divides the registers in two parts: <em>preserved</em> and <em>scratch</em> registers.</p> <p>The values of the preserved register must remain unchanged across function calls. So a called function (the <em>“callee”</em>) is only allowed to overwrite these registers if it restores their original values before returning. Therefore these registers are called <em>“callee-saved”</em>. A common pattern is to save these registers to the stack at the function’s beginning and restore them just before returning.</p> <p>In contrast, a called function is allowed to overwrite scratch registers without restrictions. If the caller wants to preserve the value of a scratch register across a function call, it needs to backup and restore it (e.g. by pushing it to the stack before the function call). So the scratch registers are <em>caller-saved</em>.</p> <p>On x86_64, the C calling convention specifies the following preserved and scratch registers:</p> <table><thead><tr><th>preserved registers</th><th>scratch registers</th></tr></thead><tbody> <tr><td><code>rbp</code>, <code>rbx</code>, <code>rsp</code>, <code>r12</code>, <code>r13</code>, <code>r14</code>, <code>r15</code></td><td><code>rax</code>, <code>rcx</code>, <code>rdx</code>, <code>rsi</code>, <code>rdi</code>, <code>r8</code>, <code>r9</code>, <code>r10</code>, <code>r11</code></td></tr> <tr><td><em>callee-saved</em></td><td><em>caller-saved</em></td></tr> </tbody></table> <p>The compiler knows these rules, so it generates the code accordingly. For example, most functions begin with a <code>push rbp</code>, which backups <code>rbp</code> on the stack (because it’s a callee-saved register).</p> <h3 id="the-exception-calling-convention"><a class="zola-anchor" href="#the-exception-calling-convention" aria-label="Anchor link for: the-exception-calling-convention">🔗</a>The Exception Calling Convention</h3> <p>In contrast to function calls, exceptions can occur on <em>any</em> instruction. In most cases we don’t even know at compile time if the generated code will cause an exception. For example, the compiler can’t know if an instruction causes a stack overflow or an other page fault.</p> <p>Since we don’t know when an exception occurs, we can’t backup any registers before. This means that we can’t use a calling convention that relies on caller-saved registers for our exception handlers. But we do so at the moment: Our exception handlers are declared as <code>extern "C" fn</code> and thus use the C calling convention.</p> <p>So here is what happens:</p> <ul> <li><code>rust_main</code> is executing; it writes some memory address into <code>rax</code>.</li> <li>The <code>int3</code> instruction causes a breakpoint exception.</li> <li>Our <code>breakpoint_handler</code> prints to the screen and assumes that it can overwrite <code>rax</code> freely (since it’s a scratch register). Somehow the value <code>0</code> ends up in <code>rax</code>.</li> <li>We return from the breakpoint exception using <code>iretq</code>.</li> <li><code>rust_main</code> continues and accesses the memory address in <code>rax</code>.</li> <li>The CPU tries to access address <code>0x1</code>, which causes a page fault.</li> </ul> <p>So our exception handler erroneously assumes that the scratch registers were saved by the caller. But the caller (<code>rust_main</code>) couldn’t save any registers since it didn’t know that an exception occurs. So nobody saves <code>rax</code> and the other scratch registers, which leads to the page fault.</p> <p>The problem is that we use a calling convention with caller-saved registers for our exception handlers. Instead, we need a calling convention means that preserves <em>all registers</em>. In other words, all registers must be callee-saved:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;all-registers-callee-saved&quot; </span><span style="color:#569cd6;">fn </span><span>exception_handler() {</span><span style="color:#569cd6;">...</span><span>} </span></code></pre> <p>Unfortunately, Rust does not support such a calling convention. It was <a href="https://github.com/rust-lang/rfcs/pull/1275">proposed once</a>, but did not get accepted for various reasons. The primary reason was that such calling conventions can be simulated by writing a naked wrapper function.</p> <p>(Remember: <a href="https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md">Naked functions</a> are functions without prologue and can contain only inline assembly. They were discussed in the <a href="https://os.phil-opp.com/better-exception-messages/#naked-functions">previous post</a>.)</p> <h3 id="a-naked-wrapper-function"><a class="zola-anchor" href="#a-naked-wrapper-function" aria-label="Anchor link for: a-naked-wrapper-function">🔗</a>A naked wrapper function</h3> <p>Such a naked wrapper function might look like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[naked] </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>calling_convention_wrapper() { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot; </span><span style="color:#d69d85;"> push rax </span><span style="color:#d69d85;"> push rcx </span><span style="color:#d69d85;"> push rdx </span><span style="color:#d69d85;"> push rsi </span><span style="color:#d69d85;"> push rdi </span><span style="color:#d69d85;"> push r8 </span><span style="color:#d69d85;"> push r9 </span><span style="color:#d69d85;"> push r10 </span><span style="color:#d69d85;"> push r11 </span><span style="color:#d69d85;"> // TODO: call exception handler with C calling convention </span><span style="color:#d69d85;"> pop r11 </span><span style="color:#d69d85;"> pop r10 </span><span style="color:#d69d85;"> pop r9 </span><span style="color:#d69d85;"> pop r8 </span><span style="color:#d69d85;"> pop rdi </span><span style="color:#d69d85;"> pop rsi </span><span style="color:#d69d85;"> pop rdx </span><span style="color:#d69d85;"> pop rcx </span><span style="color:#d69d85;"> pop rax </span><span style="color:#d69d85;"> &quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>This wrapper function saves all <em>scratch</em> registers to the stack before calling the exception handler and restores them afterwards. Note that we <code>pop</code> the registers in reverse order.</p> <p>We don’t need to backup <em>preserved</em> registers since they are callee-saved in the C calling convention. Thus, the compiler already takes care of preserving their values.</p> <h3 id="fixing-our-handler-macro"><a class="zola-anchor" href="#fixing-our-handler-macro" aria-label="Anchor link for: fixing-our-handler-macro">🔗</a>Fixing our Handler Macro</h3> <p>Let’s update our handler macro to fix the calling convention problem. Therefore we need to backup and restore all scratch registers. For that we create two new macros:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! save_scratch_registers { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;push rax </span><span style="color:#d69d85;"> push rcx </span><span style="color:#d69d85;"> push rdx </span><span style="color:#d69d85;"> push rsi </span><span style="color:#d69d85;"> push rdi </span><span style="color:#d69d85;"> push r8 </span><span style="color:#d69d85;"> push r9 </span><span style="color:#d69d85;"> push r10 </span><span style="color:#d69d85;"> push r11 </span><span style="color:#d69d85;"> &quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> } </span><span>} </span><span> </span><span>macro_rules! restore_scratch_registers { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;pop r11 </span><span style="color:#d69d85;"> pop r10 </span><span style="color:#d69d85;"> pop r9 </span><span style="color:#d69d85;"> pop r8 </span><span style="color:#d69d85;"> pop rdi </span><span style="color:#d69d85;"> pop rsi </span><span style="color:#d69d85;"> pop rdx </span><span style="color:#d69d85;"> pop rcx </span><span style="color:#d69d85;"> pop rax </span><span style="color:#d69d85;"> &quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>We need to declare these macros <em>above</em> our <code>handler</code> macro, since macros are only available after their declaration.</p> <p>Now we can use these macros to fix our <code>handler!</code> macro:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! handler { </span><span> ($name: ident) </span><span style="color:#569cd6;">=&gt; </span><span>{{ </span><span> #[naked] </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> save_scratch_registers!(); </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rdi, rsp </span><span style="color:#d69d85;"> add rdi, 9*8 // calculate exception stack frame pointer </span><span style="color:#d69d85;"> // sub rsp, 8 (stack is aligned already) </span><span style="color:#d69d85;"> call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>($name </span><span style="color:#569cd6;">as </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>(</span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame)) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> </span><span> restore_scratch_registers!(); </span><span> asm!(</span><span style="color:#d69d85;">&quot; </span><span style="color:#d69d85;"> // add rsp, 8 (undo stack alignment; not needed anymore) </span><span style="color:#d69d85;"> iretq&quot; </span><span> :::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span> } </span><span> wrapper </span><span> }} </span><span>} </span></code></pre> <p>It’s important that we save the registers first, before we modify any of them. After the <code>call</code> instruction (but before <code>iretq</code>) we restore the registers again. Because we’re now changing <code>rsp</code> (by pushing the register values) before we load it into <code>rdi</code>, we would get a wrong exception stack frame pointer. Therefore we need to adjust it by adding the number of bytes we push. We push 9 registers that are 8 bytes each, so <code>9 * 8</code> bytes in total.</p> <p>Note that we no longer need to manually align the stack pointer, because we’re pushing an uneven number of registers in <code>save_scratch_registers</code>. Thus the stack pointer already has the required 16-byte alignment.</p> <h3 id="testing-it-again"><a class="zola-anchor" href="#testing-it-again" aria-label="Anchor link for: testing-it-again">🔗</a>Testing it again</h3> <p>Let’s test it again with our corrected <code>handler!</code> macro:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/qemu-breakpoint-return.png" alt="QEMU output with EXCEPTION BREAKPOINT and It did not crash" /></p> <p>The page fault is gone and we see the <em>“It did not crash”</em> message again!</p> <p>So the page fault occurred because our exception handler didn’t preserve the scratch register <code>rax</code>. Our new <code>handler!</code> macro fixes this problem by saving all scratch registers (including <code>rax</code>) before calling exception handlers. Thus, <code>rax</code> still contains the valid memory address when <code>rust-main</code> continues execution.</p> <h2 id="multimedia-registers"><a class="zola-anchor" href="#multimedia-registers" aria-label="Anchor link for: multimedia-registers">🔗</a>Multimedia Registers</h2> <p>When we discussed calling conventions above, we assumed that a x86_64 CPU only has the following 16 registers: <code>rax</code>, <code>rbx</code>, <code>rcx</code>, <code>rdx</code>, <code>rsi</code>, <code>rdi</code>, <code>rsp</code>, <code>rbp</code>, <code>r8</code>, <code>r9</code>, <code>r10</code>, <code>r11</code>.<code>r12</code>, <code>r13</code>, <code>r14</code>, and <code>r15</code>. These registers are called <em>general purpose registers</em> since each of them can be used for arithmetic and load/store instructions.</p> <p>However, modern CPUs also have a set of <em>special purpose registers</em>, which can be used to improve performance in several use cases. On x86_64, the most important set of special purpose registers are the <em>multimedia registers</em>. These registers are larger than the general purpose registers and can be used to speed up audio/video processing or matrix calculations. For example, we could use them to add two 4-dimensional vectors <em>in a single CPU instruction</em>:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/vector-addition.png" alt="(1,2,3,4) + (5,6,7,8) = (6,8,10,12)" /></p> <p>Such multimedia instructions are called <a href="https://en.wikipedia.org/wiki/SIMD">Single Instruction Multiple Data (SIMD)</a> instructions, because they simultaneously perform an operation (e.g. addition) on multiple data words. Good compilers are able to transform normal loops into such SIMD code automatically. This process is called <a href="https://en.wikipedia.org/wiki/Automatic_vectorization">auto-vectorization</a> and can lead to huge performance improvements.</p> <p>However, auto-vectorization causes a problem for us: Most of the multimedia registers are caller-saved. According to our discussion of calling conventions above, this means that our exception handlers erroneously assume that they are allowed to overwrite them without preserving their values.</p> <p>We don’t use any multimedia registers explicitly, but the Rust compiler might auto-vectorize our code (including the exception handlers). Thus we could silently clobber the multimedia registers, which leads to the same problems as above:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/xmm-overwrite.svg" alt="example: program uses mm0, mm1, and mm2. Then the exception handler clobbers mm1." /></p> <p>This example shows a program that is using the first three multimedia registers (<code>mm0</code> to <code>mm2</code>). At some point, an exception occurs and control is transferred to the exception handler. The exception handler uses <code>mm1</code> for its own data and thus overwrites the previous value. When the exception is resolved, the CPU continues the interrupted program again. However, the program is now corrupt since it relies on the original <code>mm1</code> value.</p> <h3 id="saving-and-restoring-multimedia-registers"><a class="zola-anchor" href="#saving-and-restoring-multimedia-registers" aria-label="Anchor link for: saving-and-restoring-multimedia-registers">🔗</a>Saving and Restoring Multimedia Registers</h3> <p>In order to fix this problem, we need to backup all caller-saved multimedia registers before we call the exception handler. The problem is that the set of multimedia registers varies between CPUs. There are different standards:</p> <ul> <li><a href="https://en.wikipedia.org/wiki/MMX_(instruction_set)">MMX</a>: The MMX instruction set was introduced in 1997 and defines eight 64 bit registers called <code>mm0</code> through <code>mm7</code>. These registers are just aliases for the registers of the <a href="https://en.wikipedia.org/wiki/X87">x87 floating point unit</a>.</li> <li><a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a>: The <em>Streaming SIMD Extensions</em> instruction set was introduced in 1999. Instead of re-using the floating point registers, it adds a completely new register set. The sixteen new registers are called <code>xmm0</code> through <code>xmm15</code> and are 128 bits each.</li> <li><a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions">AVX</a>: The <em>Advanced Vector Extensions</em> are extensions that further increase the size of the multimedia registers. The new registers are called <code>ymm0</code> through <code>ymm15</code> and are 256 bits each. They extend the <code>xmm</code> registers, so e.g. <code>xmm0</code> is the lower (or upper?) half of <code>ymm0</code>.</li> </ul> <p>The Rust compiler (and LLVM) assume that the <code>x86_64-unknown-linux-gnu</code> target supports only MMX and SSE, so we don’t need to save the <code>ymm0</code> through <code>ymm15</code>. But we need to save <code>xmm0</code> through <code>xmm15</code> and also <code>mm0</code> through <code>mm7</code>. There is a special instruction to do this: <a href="https://www.felixcloutier.com/x86/fxsave">fxsave</a>. This instruction saves the floating point and multimedia state to a given address. It needs <em>512 bytes</em> to store that state.</p> <p>In order to save/restore the multimedia registers, we <em>could</em> add new macros:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>macro_rules! save_multimedia_registers { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;sub rsp, 512 </span><span style="color:#d69d85;"> fxsave [rsp] </span><span style="color:#d69d85;"> &quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> } </span><span>} </span><span> </span><span>macro_rules! restore_multimedia_registers { </span><span> () </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;fxrstor [rsp] </span><span style="color:#d69d85;"> add rsp, 512 </span><span style="color:#d69d85;"> &quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>First, we reserve the 512 bytes on the stack and then we use <code>fxsave</code> to backup the multimedia registers. In order to restore them later, we use the <a href="https://www.felixcloutier.com/x86/fxrstor">fxrstor</a> instruction. Note that <code>fxsave</code> and <code>fxrstor</code> require a 16 byte aligned memory address.</p> <p>However, <em>we won’t do it that way</em>. The problem is the large amount of memory required. We will reuse the same code when we handle hardware interrupts in a future post. So for each mouse click, pressed key, or arrived network package we need to write 512 bytes to memory. This would be a huge performance problem.</p> <p>Fortunately, there exists an alternative solution.</p> <h3 id="disabling-multimedia-extensions"><a class="zola-anchor" href="#disabling-multimedia-extensions" aria-label="Anchor link for: disabling-multimedia-extensions">🔗</a>Disabling Multimedia Extensions</h3> <p>We just disable MMX, SSE, and all the other fancy multimedia extensions in our kernel<sup class="footnote-reference"><a href="#fn-userspace-sse">1</a></sup>. This way, our exception handlers won’t clobber the multimedia registers because they won’t use them at all.</p> <div class="footnote-definition" id="fn-userspace-sse"><sup class="footnote-definition-label">1</sup> <p>Userspace programs will still be able to use the multimedia registers.</p> </div> <p>This solution has its own disadvantages, of course. For example, it leads to slower kernel code because the compiler can’t perform any auto-vectorization optimizations. But it’s still the faster solution (since we save many memory accesses) and most kernels do it this way (including Linux).</p> <p>So how do we disable MMX and SSE? Well, we just tell the compiler that our target system doesn’t support it. Since the very beginning, we’re compiling our kernel for the <code>x86_64-unknown-linux-gnu</code> target. This worked fine so far, but now we want a different target without support for multimedia extensions. We can do so by creating a <em>target configuration file</em>.</p> <h3 id="target-specifications"><a class="zola-anchor" href="#target-specifications" aria-label="Anchor link for: target-specifications">🔗</a>Target Specifications</h3> <p>In order to disable the multimedia extensions for our kernel, we need to compile for a custom target. We want a target that is equal to <code>x86_64-unknown-linux-gnu</code>, but without MMX and SSE support. Rust allows us to specify such a target using a JSON configuration file.</p> <p>A minimal target specification that describes the <code>x86_64-unknown-linux-gnu</code> target looks like this:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-linux-gnu&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;data-layout&quot;</span><span>: </span><span style="color:#d69d85;">&quot;e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-endian&quot;</span><span>: </span><span style="color:#d69d85;">&quot;little&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-pointer-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-c-int-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;32&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;arch&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;os&quot;</span><span>: </span><span style="color:#d69d85;">&quot;none&quot; </span><span>} </span></code></pre> <p>The <code>llvm-target</code> field specifies the target triple that is passed to LLVM. We want to derive a 64-bit Linux target, so we choose <code>x86_64-unknown-linux-gnu</code>. The <code>data-layout</code> field is also passed to LLVM and specifies how data should be laid out in memory. It consists of various specifications separated by a <code>-</code> character. For example, the <code>e</code> means little endian and <code>S128</code> specifies that the stack should be 128 bits (= 16 byte) aligned. The format is described in detail in the <a href="https://llvm.org/docs/LangRef.html#data-layout">LLVM documentation</a> but there shouldn’t be a reason to change this string.</p> <p>The other fields are used for conditional compilation. This allows crate authors to use <code>cfg</code> variables to write special code for depending on the OS or the architecture. There isn’t any up-to-date documentation about these fields but the <a href="https://github.com/rust-lang/rust/blob/c772948b687488a087356cb91432425662e034b9/src/librustc_back/target/mod.rs#L194-L214">corresponding source code</a> is quite readable.</p> <h4 id="disabling-mmx-and-sse"><a class="zola-anchor" href="#disabling-mmx-and-sse" aria-label="Anchor link for: disabling-mmx-and-sse">🔗</a>Disabling MMX and SSE</h4> <p>In order to disable the multimedia extensions, we create a new target named <code>x86_64-blog_os</code>. To describe this target, we create a file named <code>x86_64-blog_os.json</code> in the project root with the following content:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-linux-gnu&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;data-layout&quot;</span><span>: </span><span style="color:#d69d85;">&quot;e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-endian&quot;</span><span>: </span><span style="color:#d69d85;">&quot;little&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-pointer-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-c-int-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;32&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;arch&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;os&quot;</span><span>: </span><span style="color:#d69d85;">&quot;none&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;features&quot;</span><span>: </span><span style="color:#d69d85;">&quot;-mmx,-sse&quot; </span><span>} </span></code></pre> <p>It’s equal to <code>x86_64-unknown-linux-gnu</code> target but has one additional option: <code>"features": "-mmx,-sse"</code>. So we added two target <em>features</em>: <code>-mmx</code> and <code>-sse</code>. The minus prefix defines that our target does <em>not</em> support this feature. So by specifying <code>-mmx</code> and <code>-sse</code>, we disable the default <code>mmx</code> and <code>sse</code> features.</p> <p>In order to compile for the new target, we need to adjust our Makefile:</p> <pre data-lang="diff" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-diff "><code class="language-diff" data-lang="diff"><span># in `Makefile` </span><span> </span><span> arch ?= x86_64 </span><span>-target ?= $(arch)-unknown-linux-gnu </span><span>+target ?= $(arch)-blog_os </span><span>... </span></code></pre> <p>The new target name (<code>x86_64-blog_os</code>) is the file name of the JSON configuration file without the <code>.json</code> extension.</p> <h3 id="cross-compilation"><a class="zola-anchor" href="#cross-compilation" aria-label="Anchor link for: cross-compilation">🔗</a>Cross compilation</h3> <p>Let’s try if our kernel still works with the new target:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; make run </span><span>Compiling raw-cpuid v2.0.1 </span><span>Compiling rlibc v0.1.5 </span><span>Compiling x86 v0.7.1 </span><span>Compiling spin v0.3.5 </span><span>error[E0463]: can&#39;t find crate for `core` </span><span> </span><span>error: aborting due to previous error </span><span> </span><span>Build failed, waiting for other jobs to finish... </span><span>... </span><span>Makefile:52: recipe for target &#39;cargo&#39; failed </span><span>make: *** [cargo] Error 101 </span></code></pre> <p>It doesn’t compile anymore. The error tells us that the Rust compiler no longer finds the core library.</p> <p>The <a href="https://doc.rust-lang.org/nightly/core/index.html">core library</a> is implicitly linked to all <code>no_std</code> crates and contains things such as <code>Result</code>, <code>Option</code>, and iterators. We’ve used that library without problems since <a href="https://os.phil-opp.com/set-up-rust/">the very beginning</a>, so why is it no longer available?</p> <p>The problem is that the core library is distributed together with the Rust compiler as a <em>precompiled</em> library. So it is only valid for the host triple, which is <code>x86_64-unknown-linux-gnu</code> in our case. If we want to compile code for other targets, we need to recompile <code>core</code> for these targets first.</p> <h4 id="xargo"><a class="zola-anchor" href="#xargo" aria-label="Anchor link for: xargo">🔗</a>Xargo</h4> <p>That’s where <a href="https://github.com/japaric/xargo">xargo</a> comes in. It is a wrapper for cargo that eases cross compilation. We can install it by executing:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo install xargo </span></code></pre> <p>Xargo depends on the rust source code, which we can install with <code>rustup component add rust-src</code>.</p> <p>Xargo is “a drop-in replacement for cargo”, so every cargo command also works with <code>xargo</code>. You can do e.g. <code>xargo --help</code>, <code>xargo clean</code>, or <code>xargo doc</code>. However, the <code>build</code> command gains additional functionality: <code>xargo build</code> will automatically cross compile the <code>core</code> library when compiling for custom targets.</p> <p>That’s exactly what we want, so we change one letter in our Makefile:</p> <pre data-lang="diff" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-diff "><code class="language-diff" data-lang="diff"><span># in `Makefile` </span><span>... </span><span> </span><span>cargo: </span><span>- @cargo build --target $(target) </span><span>+ @xargo build --target $(target) </span><span>... </span></code></pre> <p>Now the build goes through <code>xargo</code>, which should fix the compilation error. Let’s try it out:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; make run </span><span>Compiling core v0.0.0 (file:///home/…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore) </span><span>LLVM ERROR: SSE register return with SSE disabled </span><span>error: Could not compile `core`. </span></code></pre> <p>Well, we get a different error now, so it seems like we’re making progress :). It seems like there is a “SSE register return” although SSE is disabled. But what’s an “SSE register return”?</p> <h3 id="sse-register-return"><a class="zola-anchor" href="#sse-register-return" aria-label="Anchor link for: sse-register-return">🔗</a>SSE Register Return</h3> <p>Remember when we discussed calling conventions above? The calling convention defines which registers are used for return values. Well, the <a href="https://refspecs.linuxbase.org/elf/gabi41.pdf">System V ABI</a> defines that <code>xmm0</code> should be used for returning floating point values. So somewhere in the <code>core</code> library a function returns a float and LLVM doesn’t know what to do. The ABI says “use <code>xmm0</code>” but the target specification says “don’t use <code>xmm</code> registers”.</p> <p>In order to fix this problem, we need to change our float ABI. The idea is to avoid normal hardware-supported floats and use a pure software implementation instead. We can do so by enabling the <code>soft-float</code> feature for our target. For that, we edit <code>x86_64-blog_os.json</code>:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-linux-gnu&quot;</span><span>, </span><span> </span><span style="color:#ff3333;">... </span><span> </span><span style="color:#d69d85;">&quot;features&quot;</span><span>: </span><span style="color:#d69d85;">&quot;-mmx,-sse,+soft-float&quot; </span><span>} </span></code></pre> <p>The plus prefix tells LLVM to enable the <code>soft-float</code> feature.</p> <p>Let’s try <code>make run</code> again:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; make run </span><span> Compiling core v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore) </span><span> Finished release [optimized] target(s) in 21.95 secs </span><span> Compiling spin v0.4.5 </span><span> Compiling once v0.3.2 </span><span> Compiling x86 v0.8.0 </span><span> Compiling bitflags v0.9.1 </span><span> Compiling raw-cpuid v2.0.1 </span><span> Compiling rlibc v0.1.5 </span><span> Compiling linked_list_allocator v0.2.3 </span><span> Compiling volatile v0.1.0 </span><span> Compiling bitflags v0.4.0 </span><span> Compiling bit_field v0.5.0 </span><span> Compiling spin v0.3.5 </span><span> Compiling multiboot2 v0.1.0 </span><span> Compiling lazy_static v0.2.2 </span><span> Compiling hole_list_allocator v0.1.0 (file:///…/libs/hole_list_allocator) </span><span> Compiling blog_os v0.1.0 (file:///…) </span><span>error[E0463]: can&#39;t find crate for `alloc` </span><span> --&gt; src/lib.rs:33:1 </span><span> | </span><span>33 | extern crate alloc; </span><span> | ^^^^^^^^^^^^^^^^^^^ can&#39;t find crate </span><span> </span><span>error: aborting due to previous error </span></code></pre> <p>We see that <code>xargo</code> now compiles the <code>core</code> crate in release mode. Then it starts the normal cargo build. Cargo then recompiles all dependencies, since it needs to generate different code for the new target.</p> <p>However, the build still fails. The reason is that xargo only installs <code>core</code> by default, but we also need the <code>alloc</code> crate. We can enable it by creating a file named <code>Xargo.toml</code> with the following contents:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># Xargo.toml </span><span> </span><span>[</span><span style="color:#808080;">target.x86_64-blog_os.dependencies</span><span>] </span><span style="color:#569cd6;">alloc </span><span>= {} </span></code></pre> <p>Now xargo compiles <code>alloc</code>, too:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; make run </span><span> Compiling core v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore) </span><span> Compiling std_unicode v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd_unicode) </span><span> Compiling alloc v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc) </span><span> Finished release [optimized] target(s) in 28.84 secs </span><span> Compiling blog_os v0.1.0 (file:///…/Documents/blog_os/master) </span><span>warning: unused variable: `allocator` […] </span><span>warning: unused variable: `frame` […] </span><span> </span><span> Finished debug [unoptimized + debuginfo] target(s) in 1.75 secs </span></code></pre> <p>It worked! Now we have a kernel that never touches the multimedia registers! We can verify this by executing:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; objdump -d build/kernel-x86_64.bin | grep &quot;mm[0-9]&quot; </span></code></pre> <p>If the command produces no output, our kernel uses neither MMX (<code>mm0</code> – <code>mm7</code>) nor SSE (<code>xmm0</code> – <code>xmm15</code>) registers.</p> <p>So now our return-from-exception logic works without problems in <em>most</em> cases. However, there is still a pitfall hidden in the C calling convention, which might cause hideous bugs in some rare cases.</p> <h2 id="the-red-zone"><a class="zola-anchor" href="#the-red-zone" aria-label="Anchor link for: the-red-zone">🔗</a>The Red Zone</h2> <p>The <a href="https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone">red zone</a> is an optimization of the <a href="https://refspecs.linuxbase.org/elf/gabi41.pdf">System V ABI</a> that allows functions to temporary use the 128 bytes below its stack frame without adjusting the stack pointer:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/red-zone.svg" alt="stack frame with red zone" /></p> <p>The image shows the stack frame of a function with <code>n</code> local variables. On function entry, the stack pointer is adjusted to make room on the stack for the local variables.</p> <p>The red zone is defined as the 128 bytes below the adjusted stack pointer. The function can use this area for temporary data that’s not needed across function calls. Thus, the two instructions for adjusting the stack pointer can be avoided in some cases (e.g. in small leaf functions).</p> <p>However, this optimization leads to huge problems with exceptions. Let’s assume that an exception occurs while a function uses the red zone:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/red-zone-overwrite.svg" alt="red zone overwritten by exception handler" /></p> <p>The CPU and the exception handler overwrite the data in red zone. But this data is still needed by the interrupted function. So the function won’t work correctly anymore when we return from the exception handler. It might fail or cause another exception, but it could also lead to strange bugs that <a href="https://forum.osdev.org/viewtopic.php?t=21720">take weeks to debug</a>.</p> <h3 id="adjusting-our-exception-handler"><a class="zola-anchor" href="#adjusting-our-exception-handler" aria-label="Anchor link for: adjusting-our-exception-handler">🔗</a>Adjusting our Exception Handler?</h3> <p>The problem is that the <a href="https://refspecs.linuxbase.org/elf/gabi41.pdf">System V ABI</a> demands that the red zone <em>“shall not be modified by signal or interrupt handlers.”</em> Our current exception handlers do not respect this. We could try to fix it by subtracting 128 from the stack pointer before pushing anything:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span style="color:#569cd6;">sub </span><span>rsp, </span><span style="color:#b4cea8;">128 </span><span>save_scratch_registers() </span><span>... </span><span style="color:#569cd6;">call </span><span>... </span><span>... </span><span>restore_scratch_registers() </span><span style="color:#569cd6;">add </span><span>rsp, </span><span style="color:#b4cea8;">128 </span><span> </span><span style="color:#569cd6;">iretq </span></code></pre> <p><em>This will not work.</em> The problem is that the CPU pushes the exception stack frame before even calling our handler function. So the CPU itself will clobber the red zone and there is nothing we can do about that. So our only chance is to disable the red zone.</p> <h3 id="disabling-the-red-zone"><a class="zola-anchor" href="#disabling-the-red-zone" aria-label="Anchor link for: disabling-the-red-zone">🔗</a>Disabling the Red Zone</h3> <p>The red zone is a property of our target, so in order to disable it we edit our <code>x86_64-blog_os.json</code> a last time:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-linux-gnu&quot;</span><span>, </span><span> </span><span style="color:#ff3333;">... </span><span> </span><span style="color:#d69d85;">&quot;features&quot;</span><span>: </span><span style="color:#d69d85;">&quot;-mmx,-sse,+soft-float&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;disable-redzone&quot;</span><span>: </span><span style="color:#569cd6;">true </span><span>} </span></code></pre> <p>We add one additional option at the end: <code>"disable-redzone": true</code>. As you might guess, this option disables the red zone optimization.</p> <p>Now we have a red zone free kernel!</p> <h2 id="exceptions-with-error-codes"><a class="zola-anchor" href="#exceptions-with-error-codes" aria-label="Anchor link for: exceptions-with-error-codes">🔗</a>Exceptions with Error Codes</h2> <p>We’re now able to correctly return from exceptions without error codes. However, we still can’t return from exceptions that push an error code (e.g. page faults). Let’s fix that by updating our <code>handler_with_error_code</code> macro:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! handler_with_error_code { </span><span> ($name: ident) </span><span style="color:#569cd6;">=&gt; </span><span>{{ </span><span> #[naked] </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;pop rsi // pop error code into rsi </span><span style="color:#d69d85;"> mov rdi, rsp </span><span style="color:#d69d85;"> sub rsp, 8 // align the stack pointer </span><span style="color:#d69d85;"> call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>($name </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame, </span><span style="color:#569cd6;">u64</span><span>)) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot;</span><span>,</span><span style="color:#d69d85;">&quot;rsi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> asm!(</span><span style="color:#d69d85;">&quot;iretq&quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span> } </span><span> wrapper </span><span> }} </span><span>} </span></code></pre> <p>First, we change the type of the handler function: no more <code>-&gt; !</code>, so it no longer needs to diverge. We also add an <code>iretq</code> instruction at the end.</p> <p>Now we can make our <code>page_fault_handler</code> non-diverging:</p> <pre data-lang="diff" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-diff "><code class="language-diff" data-lang="diff"><span>// in src/interrupts/mod.rs </span><span> </span><span> extern &quot;C&quot; fn page_fault_handler(stack_frame: &amp;ExceptionStackFrame, </span><span>- error_code: u64) -&gt; ! { ... } </span><span>+ error_code: u64) { ... } </span></code></pre> <p>However, now we have the same problem as above: The handler function will overwrite the scratch registers and cause bugs when returning. Let’s fix this by invoking <code>save_scratch_registers</code> at the beginning:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! handler_with_error_code { </span><span> ($name: ident) </span><span style="color:#569cd6;">=&gt; </span><span>{{ </span><span> #[naked] </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> save_scratch_registers!(); </span><span> asm!(</span><span style="color:#d69d85;">&quot;pop rsi // pop error code into rsi </span><span style="color:#d69d85;"> mov rdi, rsp </span><span style="color:#d69d85;"> add rdi, 10*8 // calculate exception stack frame pointer </span><span style="color:#d69d85;"> sub rsp, 8 // align the stack pointer </span><span style="color:#d69d85;"> call $0 </span><span style="color:#d69d85;"> add rsp, 8 // undo stack pointer alignment </span><span style="color:#d69d85;"> &quot; </span><span>:: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>($name </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame, </span><span style="color:#569cd6;">u64</span><span>)) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot;</span><span>,</span><span style="color:#d69d85;">&quot;rsi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> restore_scratch_registers!(); </span><span> asm!(</span><span style="color:#d69d85;">&quot;iretq&quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span> } </span><span> wrapper </span><span> }} </span><span>} </span></code></pre> <p>Now we backup the scratch registers to the stack right at the beginning and restore them just before the <code>iretq</code>. Like in the <code>handler</code> macro, we now need to add <code>10*8</code> to <code>rdi</code> in order to get the correct exception stack frame pointer (<code>save_scratch_registers</code> pushes nine 8 byte registers, plus the error code). We also need to undo the stack pointer alignment after the <code>call</code> <sup class="footnote-reference"><a href="#fn-stack-alignment">2</a></sup>.</p> <div class="footnote-definition" id="fn-stack-alignment"><sup class="footnote-definition-label">2</sup> <p>The stack alignment is actually wrong here, since we additionally pushed an uneven number of registers. However, the <code>pop rsi</code> is wrong too, since the error code is no longer at the top of the stack. When we fix that problem, the stack alignment becomes correct again. So I left it in to keep things simple.</p> </div> <p>Now we have one last bug: We <code>pop</code> the error code into <code>rsi</code>, but the error code is no longer at the top of the stack (since <code>save_scratch_registers</code> pushed 9 registers on top of it). So we need to do it differently:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! handler_with_error_code { </span><span> ($name: ident) </span><span style="color:#569cd6;">=&gt; </span><span>{{ </span><span> #[naked] </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> save_scratch_registers!(); </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rsi, [rsp + 9*8] // load error code into rsi </span><span style="color:#d69d85;"> mov rdi, rsp </span><span style="color:#d69d85;"> add rdi, 10*8 // calculate exception stack frame pointer </span><span style="color:#d69d85;"> sub rsp, 8 // align the stack pointer </span><span style="color:#d69d85;"> call $0 </span><span style="color:#d69d85;"> add rsp, 8 // undo stack pointer alignment </span><span style="color:#d69d85;"> &quot; </span><span>:: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>($name </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame, </span><span style="color:#569cd6;">u64</span><span>)) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot;</span><span>,</span><span style="color:#d69d85;">&quot;rsi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> restore_scratch_registers!(); </span><span> asm!(</span><span style="color:#d69d85;">&quot;add rsp, 8 // pop error code </span><span style="color:#d69d85;"> iretq&quot; </span><span>:::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>, </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span> } </span><span> wrapper </span><span> }} </span><span>} </span></code></pre> <p>Instead of using <code>pop</code>, we’re calculating the error code address manually (<code>save_scratch_registers</code> pushes nine 8 byte registers) and load it into <code>rsi</code> using a <code>mov</code>. So now the error code stays on the stack. But <code>iretq</code> doesn’t handle the error code, so we need to pop it before invoking <code>iretq</code>.</p> <p>Phew! That was a lot of fiddling with assembly. Let’s test if it still works.</p> <h3 id="testing-1"><a class="zola-anchor" href="#testing-1" aria-label="Anchor link for: testing-1">🔗</a>Testing</h3> <p>First, we test if the exception stack frame pointer and the error code are still correct:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in rust_main in src/lib.rs </span><span> </span><span style="color:#569cd6;">... </span><span style="color:#569cd6;">unsafe </span><span>{ int!(</span><span style="color:#b5cea8;">3</span><span>) }; </span><span> </span><span style="color:#608b4e;">// provoke a page fault </span><span style="color:#569cd6;">unsafe </span><span>{ *(</span><span style="color:#b5cea8;">0xdeadbeaf </span><span style="color:#569cd6;">as *mut u64</span><span>) = </span><span style="color:#b5cea8;">42</span><span>; } </span><span> </span><span>println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span style="color:#569cd6;">loop </span><span>{} </span></code></pre> <p>This should cause the following error message:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>EXCEPTION: PAGE FAULT while accessing 0xdeadbeaf </span><span>error code: CAUSED_BY_WRITE </span><span>ExceptionStackFrame { </span><span> instruction_pointer: 1114753, </span><span> code_segment: 8, </span><span> cpu_flags: 2097158, </span><span> stack_pointer: 1171104, </span><span> stack_segment: 16 </span><span>} </span></code></pre> <p>The error code should still be <code>CAUSED_BY_WRITE</code> and the exception stack frame values should also be correct (e.g. <code>code_segment</code> should be 8 and <code>stack_segment</code> should be 16).</p> <h4 id="returning-from-page-faults"><a class="zola-anchor" href="#returning-from-page-faults" aria-label="Anchor link for: returning-from-page-faults">🔗</a>Returning from Page Faults</h4> <p>Let’s see what happens if we comment out the trailing <code>loop</code> in our page fault handler:</p> <p><img src="https://os.phil-opp.com/returning-from-exceptions/qemu-page-fault-return.png" alt="QEMU printing the same page fault message again and again" /></p> <p>We see that the same error message is printed over and over again. Here is what happens:</p> <ul> <li>The CPU executes <code>rust_main</code> and tries to access <code>0xdeadbeaf</code>. This causes a page fault.</li> <li>The page fault handler prints an error message and returns without fixing the cause of the exception (<code>0xdeadbeaf</code> is still unaccessible).</li> <li>The CPU restarts the instruction that caused the page fault and thus tries to access <code>0xdeadbeaf</code> again. Of course, this causes a page fault again.</li> <li>The page fault handler prints the error message and returns.</li> </ul> <p>… and so on. Thus, our code indefinitely jumps between the page fault handler and the instruction that accesses <code>0xdeadbeaf</code>.</p> <p>This is a good thing! It means that our <code>iretq</code> logic is working correctly, since it returns to the correct instruction every time. So our <code>handler_with_error_code</code> macro seems to be correct.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>We are now able to catch exceptions and to return from them. However, there are still exceptions that completely crash our kernel by causing a <a href="https://en.wikipedia.org/wiki/Triple_fault">triple fault</a>. In the next post, we will fix this issue by handling a special type of exception: the <a href="https://en.wikipedia.org/wiki/Double_fault">double fault</a>. Thus, we will be able to avoid random reboots in our kernel.</p> Better Exception Messages Wed, 03 Aug 2016 00:00:00 +0000 https://os.phil-opp.com/better-exception-messages/ https://os.phil-opp.com/better-exception-messages/ <p>In this post, we explore exceptions in more detail. Our goal is to print additional information when an exception occurs, for example the values of the instruction and stack pointer. In the course of this, we will explore inline assembly and naked functions. We will also add a handler function for page faults and read the associated error code.</p> <span id="continue-reading"></span> <p>As always, the complete source code is on <a href="https://github.com/phil-opp/blog_os/tree/better_exception_messages">GitHub</a>. Please file <a href="https://github.com/phil-opp/blog_os/issues">issues</a> for any problems, questions, or improvement suggestions. There is also a <a href="https://gitter.im/phil-opp/blog_os">gitter chat</a> and a comment section at the end of this page.</p> <blockquote> <p><strong>Note</strong>: This post describes how to handle exceptions using naked functions (see <a href="https://os.phil-opp.com/edition-1/extra/naked-exceptions/">“Handling Exceptions with Naked Functions”</a> for an overview). Our new way of handling exceptions can be found in the <a href="https://os.phil-opp.com/handling-exceptions/">“Handling Exceptions”</a> post.</p> </blockquote> <h2 id="exceptions-in-detail"><a class="zola-anchor" href="#exceptions-in-detail" aria-label="Anchor link for: exceptions-in-detail">🔗</a>Exceptions in Detail</h2> <p>An exception signals that something is wrong with the currently-executed instruction. Whenever an exception occurs, the CPU interrupts its current work and starts an internal exception routine.</p> <p>This routine involves reading the interrupt descriptor table and invoking the registered handler function. But first, the CPU pushes various information onto the stack, which describe the current state and provide information about the cause of the exception:</p> <p><img src="https://os.phil-opp.com/better-exception-messages/exception-stack-frame.svg" alt="exception stack frame" /></p> <p>The pushed information contain the instruction and stack pointer, the current CPU flags, and (for some exceptions) an error code, which contains further information about the cause of the exception. Let’s look at the fields in detail:</p> <ul> <li>First, the CPU aligns the stack pointer on a 16-byte boundary. This allows the handler function to use SSE instructions, which partly expect such an alignment.</li> <li>After that, the CPU pushes the stack segment descriptor (SS) and the old stack pointer (from before the alignment) onto the stack. This allows us to restore the previous stack pointer when we want to resume the interrupted program.</li> <li>Then the CPU pushes the contents of the <a href="https://en.wikipedia.org/wiki/FLAGS_register">RFLAGS</a> register. This register contains various state information of the interrupted program. For example, it indicates if interrupts were enabled and whether the last executed instruction returned zero.</li> <li>Next the CPU pushes the instruction pointer and its code segment descriptor onto the stack. This tells us the address of the last executed instruction, which caused the exception.</li> <li>Finally, the CPU pushes an error code for some exceptions. This error code only exists for exceptions such as page faults or general protection faults and provides additional information. For example, it tells us whether a page fault was caused by a read or a write request.</li> </ul> <h2 id="printing-the-exception-stack-frame"><a class="zola-anchor" href="#printing-the-exception-stack-frame" aria-label="Anchor link for: printing-the-exception-stack-frame">🔗</a>Printing the Exception Stack Frame</h2> <p>Let’s create a struct that represents the exception stack frame:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>#[derive(Debug)] </span><span>#[repr(C)] </span><span style="color:#569cd6;">struct </span><span>ExceptionStackFrame { </span><span> instruction_pointer: </span><span style="color:#569cd6;">u64</span><span>, </span><span> code_segment: </span><span style="color:#569cd6;">u64</span><span>, </span><span> cpu_flags: </span><span style="color:#569cd6;">u64</span><span>, </span><span> stack_pointer: </span><span style="color:#569cd6;">u64</span><span>, </span><span> stack_segment: </span><span style="color:#569cd6;">u64</span><span>, </span><span>} </span></code></pre> <p>The divide-by-zero fault pushes no error code, so we leave it out for now. Note that the stack grows downwards in memory, so we need to declare the fields in reverse order (compared to the figure above).</p> <p>Now we need a way to find the memory address of this stack frame. When we look at the above graphic again, we see that the start address of the exception stack frame is the new stack pointer. So we just need to read the value of <code>rsp</code> at the very beginning of our handler function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_handler() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> stack_frame: </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame; </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov $0, rsp&quot; </span><span>: </span><span style="color:#d69d85;">&quot;=r&quot;</span><span>(stack_frame) ::: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> } </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">EXCEPTION: DIVIDE BY ZERO</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, stack_frame); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We’re using <a href="https://doc.rust-lang.org/1.10.0/book/inline-assembly.html">inline assembly</a> here to load the value from the <code>rsp</code> register into <code>stack_frame</code>. The syntax is a bit strange, so here’s a quick explanation:</p> <ul> <li>The <code>asm!</code> macro emits raw assembly instructions. This is the only way to read raw register values in Rust.</li> <li>We insert a single assembly instruction: <code>mov $0, rsp</code>. It moves the value of <code>rsp</code> to some register (the <code>$0</code> is a placeholder for an arbitrary register, which gets filled by the compiler).</li> <li>The colons are separators. After the first colon, the <code>asm!</code> macro expects output operands. We’re specifying our <code>stack_frame</code> variable as a single output operand here. The <code>=r</code> tells the compiler that it should use any register for the first placeholder <code>$0</code>.</li> <li>After the second colon, we can specify input operands. We don’t need any, therefore we leave it empty.</li> <li>After the third colon, the macro expects so called <a href="https://doc.rust-lang.org/1.10.0/book/inline-assembly.html#clobbers">clobbers</a>. We don’t change any register values, so we leave it empty too.</li> <li>The last block (after the 4th colon) specifies options. The <code>intel</code> option tells the compiler that our code is in Intel assembly syntax (instead of the default AT&amp;T syntax).</li> </ul> <p>So the inline assembly loads the stack pointer value to <code>stack_frame</code> at the very beginning of our function. Thus we have a pointer to the exception stack frame and are able to pretty-print its <code>Debug</code> formatting through the <code>{:#?}</code> argument.</p> <h3 id="testing-it"><a class="zola-anchor" href="#testing-it" aria-label="Anchor link for: testing-it">🔗</a>Testing it</h3> <p>Let’s try it by executing <code>make run</code>:</p> <p><img src="https://os.phil-opp.com/better-exception-messages/qemu-print-stack-frame-try.png" alt="qemu printing an ExceptionStackFrame with strange values" /></p> <p>Those <code>ExceptionStackFrame</code> values look very wrong. The instruction pointer definitely shouldn’t be 1 and the code segment should be <code>0x8</code> instead of some big number. So what’s going on here?</p> <h3 id="debugging"><a class="zola-anchor" href="#debugging" aria-label="Anchor link for: debugging">🔗</a>Debugging</h3> <p>It seems like we somehow got the pointer wrong. The <code>ExceptionStackFrame</code> type and our inline assembly seem correct, so something must be modifying <code>rsp</code> before we load it into <code>stack_frame</code>.</p> <p>Let’s see what’s happening by looking at the disassembly of our function:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; objdump -d build/kernel-x86_64.bin | grep -A20 &quot;divide_by_zero_handler&quot; </span><span> </span><span> [...] </span><span>000000000010ced0 &lt;_ZN7blog_os10interrupts22divide_by_zero_handler17h62189e8E&gt;: </span><span> 10ced0: 55 push %rbp </span><span> 10ced1: 48 89 e5 mov %rsp,%rbp </span><span> 10ced4: 48 81 ec b0 00 00 00 sub $0xb0,%rsp </span><span> 10cedb: 48 8d 45 98 lea -0x68(%rbp),%rax </span><span> 10cedf: 48 b9 1d 1d 1d 1d 1d movabs $0x1d1d1d1d1d1d1d1d,%rcx </span><span> 10cee6: 1d 1d 1d </span><span> 10cee9: 48 89 4d 98 mov %rcx,-0x68(%rbp) </span><span> 10ceed: 48 89 4d f8 mov %rcx,-0x8(%rbp) </span><span> 10cef1: 48 89 e1 mov %rsp,%rcx </span><span> 10cef4: 48 89 4d f8 mov %rcx,-0x8(%rbp) </span><span> 10cef8: ... </span><span>[...] </span></code></pre> <p>Our <code>divide_by_zero_handler</code> starts at address <code>0x10ced0</code>. Let’s look at the instruction at address <code>0x10cef1</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>mov %rsp,%rcx </span></code></pre> <p>This is our inline assembly instruction, which loads the stack pointer into the <code>stack_frame</code> variable. It just looks a bit different, since it’s in AT&amp;T syntax and contains <code>rcx</code> instead of our <code>$0</code> placeholder. It moves <code>rsp</code> to <code>rcx</code>, and then the next instruction (<code>mov %rcx,-0x8(%rbp)</code>) moves <code>rcx</code> to the variable on the stack.</p> <p>We can clearly see the problem here: The compiler inserted various other instructions before our inline assembly. These instructions modify the stack pointer so that we don’t read the original <code>rsp</code> value and get a wrong pointer. But why is the compiler doing this?</p> <p>The reason is that we need some place on the stack to store things like variables. Therefore the compiler inserts a so-called <em><a href="https://en.wikipedia.org/wiki/Function_prologue">function prologue</a></em>, which prepares the stack and reserves space for all variables. In our case, the compiler subtracts from the stack pointer to make room for i.a. our <code>stack_frame</code> variable. This prologue is the first thing in every function and comes before every other code.</p> <p>So in order to correctly load the exception frame pointer, we need some way to circumvent the automatic prologue generation.</p> <h3 id="naked-functions"><a class="zola-anchor" href="#naked-functions" aria-label="Anchor link for: naked-functions">🔗</a>Naked Functions</h3> <p>Fortunately there is a way to disable the prologue: <a href="https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md">naked functions</a>. A naked function has no prologue and immediately starts with the first instruction of its body. However, most Rust code requires the prologue. Therefore naked functions should only contain inline assembly.</p> <p>A naked function looks like this (note the <code>#[naked]</code> attribute):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[naked] </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>naked_function_example() { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rax, 0x42&quot; </span><span>::: </span><span style="color:#d69d85;">&quot;rax&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> }; </span><span>} </span></code></pre> <p>Naked functions are highly unstable, so we need to add <code>#![feature(naked_functions)]</code> to our <code>src/lib.rs</code>.</p> <p>If you want to try it, insert it in <code>src/lib.rs</code> and call it from <code>rust_main</code>. When we inspect the disassembly, we see that the function prologue is missing:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; objdump -d build/kernel-x86_64.bin | grep -A5 &quot;naked_function_example&quot; </span><span>[...] </span><span>000000000010df90 &lt;_ZN7blog_os22naked_function_example17ha9f733dfe42b595dE&gt;: </span><span> 10df90: 48 c7 c0 2a 00 00 00 mov $0x42,%rax </span><span> 10df97: c3 retq </span><span> 10df98: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) </span><span> 10df9f: 00 </span></code></pre> <p>It contains just the specified inline assembly and a return instruction (you can ignore the junk values after the return statement). So let’s try to use a naked function to retrieve the exception frame pointer.</p> <h3 id="a-naked-exception-handler"><a class="zola-anchor" href="#a-naked-exception-handler" aria-label="Anchor link for: a-naked-exception-handler">🔗</a>A Naked Exception Handler</h3> <p>We can’t use Rust code in naked functions, but we still want to use Rust in our exception handler. Therefore we split our handler function in two parts. A main exception handler in Rust and a small naked wrapper function, which just loads the exception frame pointer and then calls the main handler.</p> <p>Our new two-stage exception handler looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>#[naked] </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#608b4e;">/* load exception frame pointer and call main handler */</span><span>); </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_handler(stack_frame: </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame) </span><span> -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">EXCEPTION: DIVIDE BY ZERO</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*stack_frame }); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span><span> </span></code></pre> <p>The naked wrapper function retrieves the exception stack frame pointer and then calls the <code>divide_by_zero_handler</code> with the pointer as argument. We can’t use Rust code in naked functions, so we need to do both things in inline assembly.</p> <p>Retrieving the pointer to the exception stack frame is easy: We just need to load it from the <code>rsp</code> register. Our wrapper function has no prologue (it’s naked), so we can be sure that nothing modifies the register before.</p> <p>Calling the main handler is a bit more complicated, since we need to pass the argument correctly. Our main handler uses the C calling convention, which specifies that the the first argument is passed in the <code>rdi</code> register. So we need to load the pointer value into <code>rdi</code> and then use the <code>call</code> instruction to call <code>divide_by_zero_handler</code>.</p> <p>Translated to assembly, it looks like this:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span style="color:#569cd6;">mov </span><span>rdi, rsp </span><span style="color:#569cd6;">call </span><span>divide_by_zero_handler </span></code></pre> <p>It moves the exception stack frame pointer from <code>rsp</code> to <code>rdi</code>, where the first argument is expected, and then calls the main handler. Let’s create the corresponding inline assembly to complete our wrapper function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[naked] </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rdi, rsp; call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>(divide_by_zero_handler </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>(</span><span style="color:#569cd6;">_</span><span>) -&gt; </span><span style="color:#569cd6;">!</span><span>) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>Instead of <code>call divide_by_zero_handler</code>, we use a placeholder again. The reason is Rust’s name mangling, which changes the name of the <code>divide_by_zero_handler</code> function. To circumvent this, we pass a function pointer as input parameter (after the second colon). The <code>"i"</code> tells the compiler that it is an immediate value, which can be directly inserted for the placeholder. We also specify a clobber after the third colon, which tells the compiler that we change the value of the <code>rdi</code> register.</p> <h3 id="intrinsics-unreachable"><a class="zola-anchor" href="#intrinsics-unreachable" aria-label="Anchor link for: intrinsics-unreachable">🔗</a>Intrinsics::Unreachable</h3> <p>When we try to compile it, we get the following error:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: computation may converge in a function marked as diverging </span><span> --&gt; src/interrupts/mod.rs:23:1 </span><span> |&gt; </span><span>23 |&gt; extern &quot;C&quot; fn divide_by_zero_wrapper() -&gt; ! { </span><span> |&gt; ^ </span></code></pre> <p>The reason is that we marked our <code>divide_by_zero_wrapper</code> function as diverging (the <code>!</code>). We call another diverging function in inline assembly, so it is clear that the function diverges. However, the Rust compiler doesn’t understand inline assembly, so it doesn’t know that. To fix this, we tell the compiler that all code after the <code>asm!</code> macro is unreachable:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[naked] </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rdi, rsp; call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>(divide_by_zero_handler </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>(</span><span style="color:#569cd6;">_</span><span>) -&gt; </span><span style="color:#569cd6;">!</span><span>) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span>} </span></code></pre> <p>The <a href="https://doc.rust-lang.org/nightly/core/intrinsics/fn.unreachable.html">intrinsics::unreachable</a> function is unstable, so we need to add <code>#![feature(core_intrinsics)]</code> to our <code>src/lib.rs</code>. It is just an annotation for the compiler and produces no real code. (Not to be confused with the <a href="https://doc.rust-lang.org/nightly/core/macro.unreachable!.html">unreachable!</a> macro, which is completely different!)</p> <h3 id="it-works"><a class="zola-anchor" href="#it-works" aria-label="Anchor link for: it-works">🔗</a>It works!</h3> <p>The last step is to update the interrupt descriptor table (IDT) to use our new wrapper function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">0</span><span>, divide_by_zero_wrapper); </span><span style="color:#608b4e;">// changed </span><span> idt </span><span> }; </span><span>} </span></code></pre> <p>Now we see a correct exception stack frame when we execute <code>make run</code>:</p> <p><img src="https://os.phil-opp.com/better-exception-messages/qemu-divide-by-zero-stack-frame.png" alt="QEMU showing correct divide by zero stack frame" /></p> <h2 id="testing-on-real-hardware"><a class="zola-anchor" href="#testing-on-real-hardware" aria-label="Anchor link for: testing-on-real-hardware">🔗</a>Testing on real Hardware</h2> <p>Virtual machines such as QEMU are very convenient to quickly test our kernel. However, they might behave a bit different than real hardware in some situations. So we should test our kernel on real hardware, too.</p> <p>Let’s do it by burning it to an USB stick:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; sudo dd if=build/os-x86_64.iso of=/dev/sdX; and sync </span></code></pre> <p>Replace <code>sdX</code> by the device name of your USB stick. But <strong>be careful</strong>! The command will erase everything on that device.</p> <p>Now we should be able to boot from this USB stick. When we do it, we see that it works fine on real hardware, too. Great!</p> <p>However, this section wouldn’t exist if there weren’t a problem. To trigger this problem, we add some example code to the start of our <code>divide_by_zero_handler</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_handler(...) { </span><span> </span><span style="color:#569cd6;">let</span><span> x = (</span><span style="color:#b5cea8;">1</span><span style="color:#569cd6;">u64</span><span>, </span><span style="color:#b5cea8;">2</span><span style="color:#569cd6;">u64</span><span>, </span><span style="color:#b5cea8;">3</span><span style="color:#569cd6;">u64</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> y = Some(x); </span><span> </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span>(</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">100</span><span>).map(|z| (z, z - </span><span style="color:#b5cea8;">1</span><span>)) {} </span><span> </span><span> println!(</span><span style="color:#569cd6;">...</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>This is just some garbage code that doesn’t do anything useful. When we try it in QEMU using <code>make run</code>, it still works fine. However, when we burn it to an USB stick again and boot from it on real hardware, we see that our computer reboots just before printing the exception message.</p> <p>So our code, which worked well in QEMU, <em>causes a triple fault</em> on real hardware. What’s happening?</p> <h3 id="reproducing-the-bug-in-qemu"><a class="zola-anchor" href="#reproducing-the-bug-in-qemu" aria-label="Anchor link for: reproducing-the-bug-in-qemu">🔗</a>Reproducing the Bug in QEMU</h3> <p>Debugging on a real machine is difficult. Fortunately there is a way to reproduce this bug in QEMU: We use Linux’s <a href="https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine">Kernel-based Virtual Machine</a> (KVM) by passing the <code>‑enable-kvm</code> flag:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; qemu-system-x86_64 -cdrom build/os-x86_64.iso -enable-kvm </span></code></pre> <p>Now QEMU triple faults as well. This should make debugging much easier.</p> <h3 id="debugging-1"><a class="zola-anchor" href="#debugging-1" aria-label="Anchor link for: debugging-1">🔗</a>Debugging</h3> <p>QEMU’s <code>-d int</code>, which prints every exception, doesn’t seem to work in KVM mode. However <code>-d cpu_reset</code> still works. It prints the complete CPU state whenever the CPU resets. Let’s try it:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; qemu-system-x86_64 -cdrom build/os-x86_64.iso -enable-kvm -d cpu_reset </span><span>CPU Reset (CPU 0) </span><span>EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 </span><span>ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 </span><span>EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 </span><span>[...] </span><span>CPU Reset (CPU 0) </span><span>EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663 </span><span>ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 </span><span>EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 </span><span>[...] </span><span>CPU Reset (CPU 0) </span><span>RAX=0000000000118cb8 RBX=0000000000000800 RCX=1d1d1d1d1d1d1d1d RDX=0..0000000 </span><span>RSI=0000000000112cd0 RDI=0000000000118d38 RBP=0000000000118d28 RSP=0..0118c68 </span><span>R8 =0000000000000000 R9 =0000000000000100 R10=0000000000118700 R11=0..0118a00 </span><span>R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0..0000000 </span><span>RIP=000000000010cf08 RFL=00210002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 </span><span>[...] </span></code></pre> <p>The first two resets occur while the CPU is still in 32-bit mode (<code>EAX</code> instead of <code>RAX</code>), so we ignore them. The third reset is the interesting one, because it occurs in 64-bit mode. The register dump tells us that the instruction pointer (<code>rip</code>) was <code>0x10cf08</code> just before the reset. This might be the address of the instruction that caused the triple fault.</p> <p>We can find the corresponding instruction by disassembling our kernel:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>objdump -d build/kernel-x86_64.bin | grep &quot;10cf08:&quot; </span><span> 10cf08: 0f 29 45 b0 movaps %xmm0,-0x50(%rbp) </span></code></pre> <p>The <a href="https://www.felixcloutier.com/x86/movaps">movaps</a> instruction is an <a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a> instruction that moves aligned 128bit values. It can fail for a number of reasons:</p> <ol> <li>For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.</li> <li>For an illegal address in the SS segment.</li> <li>If a memory operand is not aligned on a 16-byte boundary.</li> <li>For a page fault.</li> <li>If TS in CR0 is set.</li> </ol> <p>The segment registers contain no meaningful values in long mode, so they can’t contain illegal addresses. We did not change the TS bit in <a href="https://en.wikipedia.org/wiki/Control_register#CR0">CR0</a> and there is no reason for a page fault either. So it has to be option 3.</p> <h3 id="16-byte-alignment"><a class="zola-anchor" href="#16-byte-alignment" aria-label="Anchor link for: 16-byte-alignment">🔗</a>16-byte Alignment</h3> <p>Some SSE instructions such as <code>movaps</code> require that memory operands are 16-byte aligned. In our case, the instruction is <code>movaps %xmm0,-0x50(%rbp)</code>, which writes to address <code>rbp - 0x50</code>. Therefore <code>rbp</code> needs to be 16-byte aligned.</p> <p>Let’s look at the above <code>-d cpu_reset</code> dump again and check the value of <code>rbp</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>CPU Reset (CPU 0) </span><span>RAX=[...] RBX=[...] RCX=[...] RDX=[...] </span><span>RSI=[...] RDI=[...] RBP=0000000000118d28 RSP=[...] </span><span>... </span></code></pre> <p><code>RBP</code> is <code>0x118d28</code>, which is <em>not</em> 16-byte aligned. So this is the reason for the triple fault. (It seems like QEMU doesn’t check the alignment for <code>movaps</code>, but real hardware of course does.)</p> <p>But how did we end up with a misaligned <code>rbp</code> register?</p> <h3 id="the-base-pointer"><a class="zola-anchor" href="#the-base-pointer" aria-label="Anchor link for: the-base-pointer">🔗</a>The Base Pointer</h3> <p>In order to solve this mystery, we need to look at the disassembly of the preceding code:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; objdump -d build/kernel-x86_64.bin | grep -B10 &quot;10cf08:&quot; </span><span>000000000010cee0 &lt;_ZN7blog_os10interrupts22divide_by_zero_handler17hE&gt;: </span><span> 10cee0: 55 push %rbp </span><span> 10cee1: 48 89 e5 mov %rsp,%rbp </span><span> 10cee4: 48 81 ec c0 00 00 00 sub $0xc0,%rsp </span><span> 10ceeb: 48 8d 45 90 lea -0x70(%rbp),%rax </span><span> 10ceef: 48 b9 1d 1d 1d 1d 1d movabs $0x1d1d1d1d1d1d1d1d,%rcx </span><span> 10cef6: 1d 1d 1d </span><span> 10cef9: 48 89 4d 90 mov %rcx,-0x70(%rbp) </span><span> 10cefd: 48 89 7d f8 mov %rdi,-0x8(%rbp) </span><span> 10cf01: 0f 10 05 a8 51 00 00 movups 0x51a8(%rip),%xmm0 </span><span> 10cf08: 0f 29 45 b0 movaps %xmm0,-0x50(%rbp) </span></code></pre> <p>At the last line we have the <code>movaps</code> instruction, which caused the triple fault. The exception occurs inside our <code>divide_by_zero_handler</code> function. We see that <code>rbp</code> is loaded with the value of <code>rsp</code> at the beginning (at <code>0x10cee1</code>). The <code>rbp</code> register holds the so-called <em>base pointer</em>, which points to the beginning of the stack frame. It is used in the rest of the function to address variables and other values on the stack.</p> <p>The base pointer is initialized directly from the stack pointer (<code>rsp</code>) after pushing the old base pointer. There is no special alignment code, so the compiler blindly assumes that <code>(rsp - 8)</code><sup class="footnote-reference"><a href="#fn-rsp-8">1</a></sup> is always 16-byte aligned. This seems to be wrong in our case. But why does the compiler assume this?</p> <div class="footnote-definition" id="fn-rsp-8"><sup class="footnote-definition-label">1</sup> <p>By pushing the old base pointer, <code>rsp</code> is updated to <code>rsp-8</code>.</p> </div> <h3 id="calling-conventions"><a class="zola-anchor" href="#calling-conventions" aria-label="Anchor link for: calling-conventions">🔗</a>Calling Conventions</h3> <p>The reason is that our exception handler is defined as <code>extern "C" function</code>, which specifies that it’s using the C <a href="https://en.wikipedia.org/wiki/X86_calling_conventions">calling convention</a>. On x86_64 Linux, the C calling convention is specified by the System V AMD64 ABI (<a href="https://web.archive.org/web/20160801075139/https://www.x86-64.org/documentation/abi.pdf">PDF</a>). Section 3.2.2 defines the following:</p> <blockquote> <p>The end of the input argument area shall be aligned on a 16 byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 when control is transferred to the function entry point.</p> </blockquote> <p>The “end of the input argument area” refers to the last stack-passed argument (in our case there aren’t any). So the stack pointer must be 16 byte aligned whenever we <code>call</code> a C-compatible function. The <code>call</code> instruction then pushes the return value on the stack so that “the value (%rsp + 8) is a multiple of 16 when control is transferred to the function entry point”.</p> <p><em>Summary</em>: The calling convention requires a 16 byte aligned stack pointer before <code>call</code> instructions. The compiler relies on this requirement, but we broke it somehow. Thus the generated code triple faults due to a misaligned memory address in the <code>movaps</code> instruction.</p> <h3 id="fixing-the-alignment"><a class="zola-anchor" href="#fixing-the-alignment" aria-label="Anchor link for: fixing-the-alignment">🔗</a>Fixing the Alignment</h3> <p>In order to fix this bug, we need to make sure that the stack pointer is correctly aligned before calling <code>extern "C"</code> functions. Let’s summarize the stack pointer modifications that occur before the exception handler is called:</p> <ol> <li>The CPU aligns the stack pointer to a 16 byte boundary.</li> <li>The CPU pushes <code>ss</code>, <code>rsp</code>, <code>rflags</code>, <code>cs</code>, and <code>rip</code>. So it pushes five 8 byte registers, which makes <code>rsp</code> misaligned.</li> <li>The wrapper function calls <code>divide_by_zero_handler</code> with a misaligned stack pointer.</li> </ol> <p>The problem is that we’re pushing an uneven number of 8 byte registers. Thus we need to align the stack pointer again before the <code>call</code> instruction:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[naked] </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rdi, rsp </span><span style="color:#d69d85;"> sub rsp, 8 // align the stack pointer </span><span style="color:#d69d85;"> call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>(divide_by_zero_handler </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>(</span><span style="color:#569cd6;">_</span><span>) -&gt; </span><span style="color:#569cd6;">!</span><span>) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span>} </span></code></pre> <p>The additional <code>sub rsp, 8</code> instruction aligns the stack pointer to a 16 byte boundary. Now it should work on real hardware (and in QEMU KVM mode) again.</p> <h2 id="a-handler-macro"><a class="zola-anchor" href="#a-handler-macro" aria-label="Anchor link for: a-handler-macro">🔗</a>A Handler Macro</h2> <p>The next step is to add handlers for other exceptions. However, we would need wrapper functions for them too. To avoid this code duplication, we create a <code>handler</code> macro that creates the wrapper functions for us:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! handler { </span><span> ($name: ident) </span><span style="color:#569cd6;">=&gt; </span><span>{{ </span><span> #[naked] </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov rdi, rsp </span><span style="color:#d69d85;"> sub rsp, 8 // align the stack pointer </span><span style="color:#d69d85;"> call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>($name </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame) -&gt; </span><span style="color:#569cd6;">!</span><span>) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span> } </span><span> wrapper </span><span> }} </span><span>} </span></code></pre> <p>The macro takes a single Rust identifier (<code>ident</code>) as argument and expands to a <code>{}</code> block (hence the double braces). The block defines a new wrapper function that calls the function <code>$name</code> and passes a pointer to the exception stack frame. Note that we’re fixing the argument type to <code>&amp;ExceptionStackFrame</code>. If we used a <code>_</code> like before, the passed function could accept an arbitrary argument, which would lead to ugly bugs at runtime.</p> <p>Now we can remove the <code>divide_by_zero_wrapper</code> and use our new <code>handler!</code> macro instead:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">0</span><span>, handler!(divide_by_zero_handler)); </span><span style="color:#608b4e;">// new </span><span> idt </span><span> }; </span><span>} </span></code></pre> <p>Note that the <code>handler!</code> macro needs to be defined above the static <code>IDT</code>, because macros are only available after their definition.</p> <h3 id="invalid-opcode-exception"><a class="zola-anchor" href="#invalid-opcode-exception" aria-label="Anchor link for: invalid-opcode-exception">🔗</a>Invalid Opcode Exception</h3> <p>With the <code>handler!</code> macro we can create new handler functions easily. For example, we can add a handler for the invalid opcode exception as follows:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">0</span><span>, handler!(divide_by_zero_handler)); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">6</span><span>, handler!(invalid_opcode_handler)); </span><span style="color:#608b4e;">// new </span><span> idt </span><span> }; </span><span>} </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>invalid_opcode_handler(stack_frame: </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame) </span><span> -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> stack_frame = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*stack_frame }; </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">EXCEPTION: INVALID OPCODE at </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> stack_frame.instruction_pointer, stack_frame); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>Invalid opcode faults have the vector number 6, so we set the 6th IDT entry. This time we additionally print the address of the invalid instruction.</p> <p>We can test our new handler with the special <a href="https://www.felixcloutier.com/x86/ud">ud2</a> instruction, which generates a invalid opcode:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> </span><span style="color:#608b4e;">// initialize our IDT </span><span> interrupts::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// provoke a invalid opcode exception </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ asm!(</span><span style="color:#d69d85;">&quot;ud2&quot;</span><span>) }; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <h2 id="exceptions-with-error-codes"><a class="zola-anchor" href="#exceptions-with-error-codes" aria-label="Anchor link for: exceptions-with-error-codes">🔗</a>Exceptions with Error Codes</h2> <p>When a divide-by-zero exception occurs, we immediately know the reason: Someone tried to divide by zero. In contrast, there are faults with many possible causes. For example, a page fault occurs in many occasions: When accessing a non-present page, when writing to a read-only page, when the page table is malformed, etc. In order to differentiate these causes, the CPU pushes an additional error code onto the stack for such exceptions, which gives additional information.</p> <h3 id="a-new-macro"><a class="zola-anchor" href="#a-new-macro" aria-label="Anchor link for: a-new-macro">🔗</a>A new Macro</h3> <p>Since the CPU pushes an additional error code, the stack frame is different and our <code>handler!</code> macro is not applicable. Therefore we create a new <code>handler_with_error_code!</code> macro for them:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>macro_rules! handler_with_error_code { </span><span> ($name: ident) </span><span style="color:#569cd6;">=&gt; </span><span>{{ </span><span> #[naked] </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>wrapper() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;pop rsi // pop error code into rsi </span><span style="color:#d69d85;"> mov rdi, rsp </span><span style="color:#d69d85;"> sub rsp, 8 // align the stack pointer </span><span style="color:#d69d85;"> call $0&quot; </span><span> :: </span><span style="color:#d69d85;">&quot;i&quot;</span><span>($name </span><span style="color:#569cd6;">as extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>( </span><span> </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame, </span><span style="color:#569cd6;">u64</span><span>) -&gt; </span><span style="color:#569cd6;">!</span><span>) </span><span> : </span><span style="color:#d69d85;">&quot;rdi&quot;</span><span>,</span><span style="color:#d69d85;">&quot;rsi&quot; </span><span>: </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>); </span><span> ::core::intrinsics::unreachable(); </span><span> } </span><span> } </span><span> wrapper </span><span> }} </span><span>} </span></code></pre> <p>The difference to the <code>handler!</code> macro is the additional error code argument. The CPU pushes the error code last, so we pop it right at the beginning of the wrapper function. We pop it into <code>rsi</code> because the C calling convention expects the second argument in it.</p> <h3 id="a-page-fault-handler"><a class="zola-anchor" href="#a-page-fault-handler" aria-label="Anchor link for: a-page-fault-handler">🔗</a>A Page Fault Handler</h3> <p>Let’s write a page fault handler which analyzes and prints the error code:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>page_fault_handler(stack_frame: </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame, </span><span> error_code: </span><span style="color:#569cd6;">u64</span><span>) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!( </span><span> </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">EXCEPTION: PAGE FAULT with error code </span><span style="color:#b4cea8;">{:?}</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> error_code, </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*stack_frame }); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We need to register our new handler function in the static interrupt descriptor table (IDT):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> </span><span> idt.set_handler(</span><span style="color:#b5cea8;">0</span><span>, handler!(divide_by_zero_handler)); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">6</span><span>, handler!(invalid_opcode_handler)); </span><span> </span><span style="color:#608b4e;">// new </span><span> idt.set_handler(</span><span style="color:#b5cea8;">14</span><span>, handler_with_error_code!(page_fault_handler)); </span><span> </span><span> idt </span><span> }; </span><span>} </span></code></pre> <p>Page faults have the vector number 14, so we set the 14th IDT entry.</p> <h4 id="testing-it-1"><a class="zola-anchor" href="#testing-it-1" aria-label="Anchor link for: testing-it-1">🔗</a>Testing it</h4> <p>Let’s test our new page fault handler by provoking a page fault in our main function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> </span><span style="color:#608b4e;">// initialize our IDT </span><span> interrupts::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// provoke a page fault </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ *(</span><span style="color:#b5cea8;">0xdeadbeaf </span><span style="color:#569cd6;">as *mut u64</span><span>) = </span><span style="color:#b5cea8;">42 </span><span>}; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We get the following output:</p> <p><img src="https://os.phil-opp.com/better-exception-messages/qemu-page-fault-handler.png" alt="QEMU: page fault with error code 2 and stack frame dump" /></p> <h3 id="the-page-fault-error-code"><a class="zola-anchor" href="#the-page-fault-error-code" aria-label="Anchor link for: the-page-fault-error-code">🔗</a>The Page Fault Error Code</h3> <p>“Error code 2” is not really an useful error message. Let’s improve this by creating a <code>PageFaultErrorCode</code> type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>bitflags! { </span><span> </span><span style="color:#569cd6;">struct </span><span>PageFaultErrorCode: u64 { </span><span> const PROTECTION_VIOLATION = 1 &lt;&lt; 0; </span><span> const CAUSED_BY_WRITE = 1 &lt;&lt; 1; </span><span> const USER_MODE = 1 &lt;&lt; 2; </span><span> const MALFORMED_TABLE = 1 &lt;&lt; 3; </span><span> const INSTRUCTION_FETCH = 1 &lt;&lt; 4; </span><span> } </span><span>} </span></code></pre> <ul> <li>When the <code>PROTECTION_VIOLATION</code> flag is set, the page fault was caused e.g. by a write to a read-only page. If it’s not set, it was caused by accessing a non-present page.</li> <li>The <code>CAUSED_BY_WRITE</code> flag specifies if the fault was caused by a write (if set) or a read (if not set).</li> <li>The <code>USER_MODE</code> flag is set when the fault occurred in non-privileged mode.</li> <li>The <code>MALFORMED_TABLE</code> flag is set when the page table entry has a 1 in a reserved field.</li> <li>When the <code>INSTRUCTION_FETCH</code> flag is set, the page fault occurred while fetching the next instruction.</li> </ul> <p>Now we can improve our page fault error message by using the new <code>PageFaultErrorCode</code>. We also print the accessed memory address:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>page_fault_handler(stack_frame: </span><span style="color:#569cd6;">&amp;</span><span>ExceptionStackFrame, </span><span> error_code: </span><span style="color:#569cd6;">u64</span><span>) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control_regs; </span><span> println!( </span><span> </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">EXCEPTION: PAGE FAULT while accessing </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">\ </span><span style="color:#d69d85;"> </span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">error code: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#e3bbab;">\n</span><span style="color:#b4cea8;">{:#?}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ control_regs::cr2() }, </span><span> PageFaultErrorCode::from_bits(error_code).unwrap(), </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*stack_frame }); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>The <code>from_bits</code> function tries to convert the <code>u64</code> into a <code>PageFaultErrorCode</code>. We use <code>unwrap</code> to panic if the error code has invalid bits set, since this indicates an error in our <code>PageFaultErrorCode</code> definition or a stack corruption. We also print the contents of the <code>cr2</code> register. It contains the accessed memory address, which was the cause of the page fault.</p> <p>Now we get a useful error message when a page fault occurs, which allows us to debug it more easily:</p> <p><img src="https://os.phil-opp.com/better-exception-messages/qemu-page-fault-error-code.png" alt="QEMU: output is now PAGE FAULT with error code CAUSED_BY_WRITE" /></p> <p>As expected, the page fault was caused by write to <code>0xdeadbeaf</code>. The <code>PROTECTION_VIOLATION</code> flag is not set, so the accessed page was not present.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>Now we’re able to catch and analyze various exceptions. The next step is to <em>resolve</em> exceptions, if possible. An example is <a href="https://en.wikipedia.org/wiki/Demand_paging">demand paging</a>: The OS swaps out memory pages to disk so that a page fault occurs when the page is accessed the next time. In that case, the OS can resolve the exception by bringing the page back into memory. Afterwards, the OS resumes the interrupted program as if nothing had happened.</p> <p>The next post will implement the first portion of demand paging: saving and restoring the complete state of an program. This will allow us to transparently interrupt and resume programs in the future.</p> Catching Exceptions Sat, 28 May 2016 00:00:00 +0000 https://os.phil-opp.com/catching-exceptions/ https://os.phil-opp.com/catching-exceptions/ <p>In this post, we start exploring exceptions. We set up an interrupt descriptor table and add handler functions. At the end of this post, our kernel will be able to catch divide-by-zero faults.</p> <span id="continue-reading"></span> <p>As always, the complete source code is on <a href="https://github.com/phil-opp/blog_os/tree/catching_exceptions">GitHub</a>. Please file <a href="https://github.com/phil-opp/blog_os/issues">issues</a> for any problems, questions, or improvement suggestions. There is also a comment section at the end of this page.</p> <blockquote> <p><strong>Note</strong>: This post describes how to handle exceptions using naked functions (see <a href="https://os.phil-opp.com/edition-1/extra/naked-exceptions/">“Handling Exceptions with Naked Functions”</a> for an overview). Our new way of handling exceptions can be found in the <a href="https://os.phil-opp.com/handling-exceptions/">“Handling Exceptions”</a> post.</p> </blockquote> <h2 id="exceptions"><a class="zola-anchor" href="#exceptions" aria-label="Anchor link for: exceptions">🔗</a>Exceptions</h2> <p>An exception signals that something is wrong with the current instruction. For example, the CPU issues an exception if the current instruction tries to divide by 0. When an exception occurs, the CPU interrupts its current work and immediately calls a specific exception handler function, depending on the exception type.</p> <p>We’ve already seen several types of exceptions in our kernel:</p> <ul> <li><strong>Invalid Opcode</strong>: This exception occurs when the current instruction is invalid. For example, this exception occurred when we tried to use SSE instructions before enabling SSE. Without SSE, the CPU didn’t know the <code>movups</code> and <code>movaps</code> instructions, so it throws an exception when it stumbles over them.</li> <li><strong>Page Fault</strong>: A page fault occurs on illegal memory accesses. For example, if the current instruction tries to read from an unmapped page or tries to write to a read-only page.</li> <li><strong>Double Fault</strong>: When an exception occurs, the CPU tries to call the corresponding handler function. If another exception exception occurs <em>while calling the exception handler</em>, the CPU raises a double fault exception. This exception also occurs when there is no handler function registered for an exception.</li> <li><strong>Triple Fault</strong>: If an exception occurs while the CPU tries to call the double fault handler function, it issues a fatal <em>triple fault</em>. We can’t catch or handle a triple fault. Most processors react by resetting themselves and rebooting the operating system. This causes the bootloops we experienced in the previous posts.</li> </ul> <p>For the full list of exceptions check out the <a href="https://wiki.osdev.org/Exceptions">OSDev wiki</a>.</p> <h3 id="the-interrupt-descriptor-table"><a class="zola-anchor" href="#the-interrupt-descriptor-table" aria-label="Anchor link for: the-interrupt-descriptor-table">🔗</a>The Interrupt Descriptor Table</h3> <p>In order to catch and handle exceptions, we have to set up a so-called <em>Interrupt Descriptor Table</em> (IDT). In this table we can specify a handler function for each CPU exception. The hardware uses this table directly, so we need to follow a predefined format. Each entry must have the following 16-byte structure:</p> <table><thead><tr><th>Type</th><th>Name</th><th>Description</th></tr></thead><tbody> <tr><td>u16</td><td>Function Pointer [0:15]</td><td>The lower bits of the pointer to the handler function.</td></tr> <tr><td>u16</td><td>GDT selector</td><td>Selector of a code segment in the GDT.</td></tr> <tr><td>u16</td><td>Options</td><td>(see below)</td></tr> <tr><td>u16</td><td>Function Pointer [16:31]</td><td>The middle bits of the pointer to the handler function.</td></tr> <tr><td>u32</td><td>Function Pointer [32:63]</td><td>The remaining bits of the pointer to the handler function.</td></tr> <tr><td>u32</td><td>Reserved</td><td></td></tr> </tbody></table> <p>The options field has the following format:</p> <table><thead><tr><th>Bits</th><th>Name</th><th>Description</th></tr></thead><tbody> <tr><td>0-2</td><td>Interrupt Stack Table Index</td><td>0: Don’t switch stacks, 1-7: Switch to the n-th stack in the Interrupt Stack Table when this handler is called.</td></tr> <tr><td>3-7</td><td>Reserved</td><td></td></tr> <tr><td>8</td><td>0: Interrupt Gate, 1: Trap Gate</td><td>If this bit is 0, interrupts are disabled when this handler is called.</td></tr> <tr><td>9-11</td><td>must be one</td><td></td></tr> <tr><td>12</td><td>must be zero</td><td></td></tr> <tr><td>13‑14</td><td>Descriptor Privilege Level (DPL)</td><td>The minimal privilege level required for calling this handler.</td></tr> <tr><td>15</td><td>Present</td><td></td></tr> </tbody></table> <p>Each exception has a predefined IDT index. For example the invalid opcode exception has table index 6 and the page fault exception has table index 14. Thus, the hardware can automatically load the corresponding IDT entry for each exception. The <a href="https://wiki.osdev.org/Exceptions">Exception Table</a> in the OSDev wiki shows the IDT indexes of all exceptions in the “Vector nr.” column.</p> <p>When an exception occurs, the CPU roughly does the following:</p> <ol> <li>Read the corresponding entry from the Interrupt Descriptor Table (IDT). For example, the CPU reads the 14-th entry when a page fault occurs.</li> <li>Check if the entry is present. Raise a double fault if not.</li> <li>Push some registers on the stack, including the instruction pointer and the <a href="https://en.wikipedia.org/wiki/FLAGS_register">EFLAGS</a> register. (We will use these values in a future post.)</li> <li>Disable interrupts if the entry is an interrupt gate (bit 40 not set).</li> <li>Load the specified GDT selector into the CS segment.</li> <li>Jump to the specified handler function.</li> </ol> <h2 id="handling-exceptions"><a class="zola-anchor" href="#handling-exceptions" aria-label="Anchor link for: handling-exceptions">🔗</a>Handling Exceptions</h2> <p>Let’s try to catch and handle CPU exceptions. We start by creating a new <code>interrupts</code> module with an <code>idt</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">... </span><span style="color:#569cd6;">mod </span><span>interrupts; </span><span style="color:#569cd6;">... </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>idt; </span></code></pre> <p>Now we create types for the IDT and its entries:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/interrupts/idt.rs </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::segmentation; </span><span style="color:#569cd6;">use </span><span>x86_64::structures::gdt::SegmentSelector; </span><span style="color:#569cd6;">use </span><span>x86_64::PrivilegeLevel; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Idt([Entry; 16]); </span><span> </span><span>#[derive(Debug, Clone, Copy)] </span><span>#[repr(C, packed)] </span><span style="color:#569cd6;">pub struct </span><span>Entry { </span><span> pointer_low: </span><span style="color:#569cd6;">u16</span><span>, </span><span> gdt_selector: SegmentSelector, </span><span> options: EntryOptions, </span><span> pointer_middle: </span><span style="color:#569cd6;">u16</span><span>, </span><span> pointer_high: </span><span style="color:#569cd6;">u32</span><span>, </span><span> reserved: </span><span style="color:#569cd6;">u32</span><span>, </span><span>} </span></code></pre> <p>The IDT is variable sized and can have up to 256 entries. We only need the first 16 entries in this post, so we define the table as <code>[Entry; 16]</code>. The remaining 240 handlers are treated as non-present by the CPU.</p> <p>The <code>Entry</code> type is the translation of the above table to Rust. The <code>repr(C, packed)</code> attribute ensures that the compiler keeps the field ordering and does not add any padding between them. Instead of describing the <code>gdt_selector</code> as a plain <code>u16</code>, we use the <code>SegmentSelector</code> type of the <code>x86</code> crate. We also merge bits 32 to 47 into an <code>option</code> field, because Rust has no <code>u3</code> or <code>u1</code> type. The <code>EntryOptions</code> type is described below:</p> <h3 id="entry-options"><a class="zola-anchor" href="#entry-options" aria-label="Anchor link for: entry-options">🔗</a>Entry Options</h3> <p>The <code>EntryOptions</code> type has the following skeleton:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[derive(Debug, Clone, Copy)] </span><span style="color:#569cd6;">pub struct </span><span>EntryOptions(</span><span style="color:#569cd6;">u16</span><span>); </span><span> </span><span style="color:#569cd6;">impl </span><span>EntryOptions { </span><span> </span><span style="color:#569cd6;">fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_present(</span><span style="color:#569cd6;">&amp;mut </span><span>self, present: </span><span style="color:#569cd6;">bool</span><span>) {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>disable_interrupts(</span><span style="color:#569cd6;">&amp;mut </span><span>self, disable: </span><span style="color:#569cd6;">bool</span><span>) {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_privilege_level(</span><span style="color:#569cd6;">&amp;mut </span><span>self, dpl: </span><span style="color:#569cd6;">u16</span><span>) {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_stack_index(</span><span style="color:#569cd6;">&amp;mut </span><span>self, index: </span><span style="color:#569cd6;">u16</span><span>) {</span><span style="color:#569cd6;">...</span><span>} </span><span>} </span></code></pre> <p>The implementations of these methods need to modify the correct bits of the <code>u16</code> without touching the other bits. For example, we would need the following bit-fiddling to set the stack index:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>self.</span><span style="color:#b5cea8;">0 </span><span>= (self.</span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0xfff8</span><span>) </span><span style="color:#569cd6;">|</span><span> stack_index; </span></code></pre> <p>Or alternatively:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>self.</span><span style="color:#b5cea8;">0 </span><span>= (self.</span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">&amp; </span><span>(</span><span style="color:#569cd6;">!</span><span style="color:#b5cea8;">0b111</span><span>)) </span><span style="color:#569cd6;">|</span><span> stack_index; </span></code></pre> <p>Or:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>self.</span><span style="color:#b5cea8;">0 </span><span>= ((self.</span><span style="color:#b5cea8;">0 </span><span>&gt;&gt; </span><span style="color:#b5cea8;">3</span><span>) &lt;&lt; </span><span style="color:#b5cea8;">3</span><span>) </span><span style="color:#569cd6;">|</span><span> stack_index; </span></code></pre> <p>Well, none of these variants is really <em>readable</em> and it’s very easy to make mistakes somewhere. Therefore I created a <code>BitField</code> trait that provides the following <a href="https://doc.rust-lang.org/nightly/core/ops/struct.Range.html">Range</a>-based API:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>self.</span><span style="color:#b5cea8;">0.</span><span>set_bits(</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">3</span><span>, stack_index); </span></code></pre> <p>I think it is much more readable, since we abstracted away all bit-masking details. The <code>BitField</code> trait is contained in the <a href="https://crates.io/crates/bit_field">bit_field</a> crate. (It’s pretty new, so it might still contain bugs.) To add it as dependency, we run <code>cargo add bit_field</code> and add <code>extern crate bit_field;</code> to our <code>src/lib.rs</code>.</p> <p>Now we can use the trait to implement the methods of <code>EntryOptions</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/idt.rs </span><span> </span><span style="color:#569cd6;">use </span><span>bit_field::BitField; </span><span> </span><span>#[derive(Debug, Clone, Copy)] </span><span style="color:#569cd6;">pub struct </span><span>EntryOptions(</span><span style="color:#569cd6;">u16</span><span>); </span><span> </span><span style="color:#569cd6;">impl </span><span>EntryOptions { </span><span> </span><span style="color:#569cd6;">fn </span><span>minimal() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> options = </span><span style="color:#b5cea8;">0</span><span>; </span><span> options.set_bits(</span><span style="color:#b5cea8;">9</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">12</span><span>, </span><span style="color:#b5cea8;">0b111</span><span>); </span><span style="color:#608b4e;">// &#39;must-be-one&#39; bits </span><span> EntryOptions(options) </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>new() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> options = </span><span style="color:#569cd6;">Self</span><span>::minimal(); </span><span> options.set_present(</span><span style="color:#569cd6;">true</span><span>).disable_interrupts(</span><span style="color:#569cd6;">true</span><span>); </span><span> options </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_present(</span><span style="color:#569cd6;">&amp;mut </span><span>self, present: </span><span style="color:#569cd6;">bool</span><span>) -&gt; </span><span style="color:#569cd6;">&amp;mut Self </span><span>{ </span><span> self.</span><span style="color:#b5cea8;">0.</span><span>set_bit(</span><span style="color:#b5cea8;">15</span><span>, present); </span><span> self </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>disable_interrupts(</span><span style="color:#569cd6;">&amp;mut </span><span>self, disable: </span><span style="color:#569cd6;">bool</span><span>) -&gt; </span><span style="color:#569cd6;">&amp;mut Self </span><span>{ </span><span> self.</span><span style="color:#b5cea8;">0.</span><span>set_bit(</span><span style="color:#b5cea8;">8</span><span>, </span><span style="color:#569cd6;">!</span><span>disable); </span><span> self </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_privilege_level(</span><span style="color:#569cd6;">&amp;mut </span><span>self, dpl: </span><span style="color:#569cd6;">u16</span><span>) -&gt; </span><span style="color:#569cd6;">&amp;mut Self </span><span>{ </span><span> self.</span><span style="color:#b5cea8;">0.</span><span>set_bits(</span><span style="color:#b5cea8;">13</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">15</span><span>, dpl); </span><span> self </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_stack_index(</span><span style="color:#569cd6;">&amp;mut </span><span>self, index: </span><span style="color:#569cd6;">u16</span><span>) -&gt; </span><span style="color:#569cd6;">&amp;mut Self </span><span>{ </span><span> self.</span><span style="color:#b5cea8;">0.</span><span>set_bits(</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">3</span><span>, index); </span><span> self </span><span> } </span><span>} </span></code></pre> <p>Note that the ranges are <em>exclusive</em> the upper bound. The <code>minimal</code> function creates an <code>EntryOptions</code> type with only the “must-be-one” bits set. The <code>new</code> function, on the other hand, chooses reasonable defaults: It sets the present bit (why would you want to create a non-present entry?) and disables interrupts (normally we don’t want that our exception handlers can be interrupted). By returning the self pointer from the <code>set_*</code> methods, we allow easy method chaining such as <code>options.set_present(true).disable_interrupts(true)</code>.</p> <h3 id="creating-idt-entries"><a class="zola-anchor" href="#creating-idt-entries" aria-label="Anchor link for: creating-idt-entries">🔗</a>Creating IDT Entries</h3> <p>Now we can add a function to create new IDT entries:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>Entry { </span><span> </span><span style="color:#569cd6;">fn </span><span>new(gdt_selector: SegmentSelector, handler: HandlerFunc) -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> pointer = handler </span><span style="color:#569cd6;">as u64</span><span>; </span><span> Entry { </span><span> gdt_selector: gdt_selector, </span><span> pointer_low: pointer </span><span style="color:#569cd6;">as u16</span><span>, </span><span> pointer_middle: (pointer &gt;&gt; </span><span style="color:#b5cea8;">16</span><span>) </span><span style="color:#569cd6;">as u16</span><span>, </span><span> pointer_high: (pointer &gt;&gt; </span><span style="color:#b5cea8;">32</span><span>) </span><span style="color:#569cd6;">as u32</span><span>, </span><span> options: EntryOptions::new(), </span><span> reserved: </span><span style="color:#b5cea8;">0</span><span>, </span><span> } </span><span> } </span><span>} </span></code></pre> <p>We take a GDT selector and a handler function as arguments and create a new IDT entry for it. The <code>HandlerFunc</code> type is described below. It is a function pointer that can be converted to an <code>u64</code>. We choose the lower 16 bits for <code>pointer_low</code>, the next 16 bits for <code>pointer_middle</code> and the remaining 32 bits for <code>pointer_high</code>. For the options field we choose our default options, i.e. present and disabled interrupts.</p> <h3 id="the-handler-function-type"><a class="zola-anchor" href="#the-handler-function-type" aria-label="Anchor link for: the-handler-function-type">🔗</a>The Handler Function Type</h3> <p>The <code>HandlerFunc</code> type is a type alias for a function type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub type </span><span style="color:#4ec9b0;">HandlerFunc </span><span>= </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn</span><span>() -&gt; </span><span style="color:#569cd6;">!</span><span>; </span></code></pre> <p>It needs to be a function with a defined <a href="https://en.wikipedia.org/wiki/Calling_convention">calling convention</a>, as it called directly by the hardware. The C calling convention is the de facto standard in OS development, so we’re using it, too. The function takes no arguments, since the hardware doesn’t supply any arguments when jumping to the handler function.</p> <p>It is important that the function is <a href="https://doc.rust-lang.org/rust-by-example/fn/diverging.html">diverging</a>, i.e. it must never return. The reason is that the hardware doesn’t <em>call</em> the handler functions, it just <em>jumps</em> to them after pushing some values to the stack. So our stack might look different:</p> <p><img src="https://os.phil-opp.com/catching-exceptions/normal-vs-interrupt-function-return.svg" alt="normal function return vs interrupt function return" /></p> <p>If our handler function returned normally, it would try to pop the return address from the stack. But it might get some completely different value then. For example, the CPU pushes an error code for some exceptions. Bad things would happen if we interpreted this error code as return address and jumped to it. Therefore interrupt handler functions must diverge<sup class="footnote-reference"><a href="#fn-must-diverge">1</a></sup>.</p> <div class="footnote-definition" id="fn-must-diverge"><sup class="footnote-definition-label">1</sup> <p>Another reason is that we overwrite the current register values by executing the handler function. Thus, the interrupted function looses its state and can’t proceed anyway.</p> </div> <h3 id="idt-methods"><a class="zola-anchor" href="#idt-methods" aria-label="Anchor link for: idt-methods">🔗</a>IDT methods</h3> <p>Let’s add a function to create new interrupt descriptor tables:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>Idt { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new() -&gt; Idt { </span><span> Idt([Entry::missing(); </span><span style="color:#b5cea8;">16</span><span>]) </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Entry { </span><span> </span><span style="color:#569cd6;">fn </span><span>missing() -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> Entry { </span><span> gdt_selector: SegmentSelector::new(</span><span style="color:#b5cea8;">0</span><span>, PrivilegeLevel::Ring0), </span><span> pointer_low: </span><span style="color:#b5cea8;">0</span><span>, </span><span> pointer_middle: </span><span style="color:#b5cea8;">0</span><span>, </span><span> pointer_high: </span><span style="color:#b5cea8;">0</span><span>, </span><span> options: EntryOptions::minimal(), </span><span> reserved: </span><span style="color:#b5cea8;">0</span><span>, </span><span> } </span><span> } </span><span>} </span></code></pre> <p>The <code>missing</code> function creates a non-present Entry. We could choose any values for the pointer and GDT selector fields as long as the present bit is not set.</p> <p>However, a table with non-present entries is not very useful. So we create a <code>set_handler</code> method to add new handler functions:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>Idt { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_handler(</span><span style="color:#569cd6;">&amp;mut </span><span>self, entry: </span><span style="color:#569cd6;">u8</span><span>, handler: HandlerFunc) </span><span> -&gt; </span><span style="color:#569cd6;">&amp;mut</span><span> EntryOptions </span><span> { </span><span> self.</span><span style="color:#b5cea8;">0</span><span>[entry </span><span style="color:#569cd6;">as usize</span><span>] = Entry::new(segmentation::cs(), handler); </span><span> </span><span style="color:#569cd6;">&amp;mut </span><span>self.</span><span style="color:#b5cea8;">0</span><span>[entry </span><span style="color:#569cd6;">as usize</span><span>].options </span><span> } </span><span>} </span></code></pre> <p>The method overwrites the specified entry with the given handler function. We use the <code>segmentation::cs</code> function of the <a href="https://docs.rs/x86_64">x86_64 crate</a> to get the current code segment descriptor. There’s no need for different kernel code segments in long mode, so the current <code>cs</code> value should be always the right choice.</p> <p>By returning a mutual reference to the entry’s options, we allow the caller to override the default settings. For example, the caller could add a non-present entry by executing: <code>idt.set_handler(11, handler_fn).set_present(false)</code>.</p> <h3 id="loading-the-idt"><a class="zola-anchor" href="#loading-the-idt" aria-label="Anchor link for: loading-the-idt">🔗</a>Loading the IDT</h3> <p>Now we’re able to create new interrupt descriptor tables with registered handler functions. We just need a way to load an IDT, so that the CPU uses it. The x86 architecture uses a special register to store the active IDT and its length. In order to load a new IDT we need to update this register through the <a href="https://www.felixcloutier.com/x86/lgdt:lidt">lidt</a> instruction.</p> <p>The <code>lidt</code> instruction expects a pointer to a special data structure, which specifies the start address of the IDT and its length:</p> <table><thead><tr><th>Type</th><th>Name</th><th>Description</th></tr></thead><tbody> <tr><td>u16</td><td>Limit</td><td>The maximum addressable byte in the table. Equal to the table size in bytes minus 1.</td></tr> <tr><td>u64</td><td>Offset</td><td>Virtual start address of the table.</td></tr> </tbody></table> <p>This structure is already contained <a href="https://docs.rs/x86_64/0.1.0/x86_64/instructions/tables/struct.DescriptorTablePointer.html">in the x86_64 crate</a>, so we don’t need to create it ourselves. The same is true for the <a href="https://docs.rs/x86_64/0.1.0/x86_64/instructions/tables/fn.lidt.html">lidt function</a>. So we just need to put the pieces together to create a <code>load</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>Idt { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>load(</span><span style="color:#569cd6;">&amp;</span><span>self) { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::tables::{DescriptorTablePointer, lidt}; </span><span> </span><span style="color:#569cd6;">use </span><span>core::mem::size_of; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> ptr = DescriptorTablePointer { </span><span> base: self </span><span style="color:#569cd6;">as *const _ as u64</span><span>, </span><span> limit: (size_of::&lt;</span><span style="color:#569cd6;">Self</span><span>&gt;() - </span><span style="color:#b5cea8;">1</span><span>) </span><span style="color:#569cd6;">as u16</span><span>, </span><span> }; </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ lidt(</span><span style="color:#569cd6;">&amp;</span><span>ptr) }; </span><span> } </span><span>} </span></code></pre> <p>The method does not need to modify the IDT, so it takes <code>self</code> by immutable reference. First, we create a <code>DescriptorTablePointer</code> and then we pass it to <code>lidt</code>. The <code>lidt</code> function expects that the <code>base</code> field has the type <code>u64</code>, therefore we need to cast the <code>self</code> pointer. For calculating the <code>limit</code> we use <a href="https://doc.rust-lang.org/nightly/core/mem/fn.size_of.html">mem::size_of</a>. The additional <code>-1</code> is needed because the limit field has to be the maximum addressable byte (inclusive bound). We need an unsafe block around <code>lidt</code>, because the function assumes that the specified handler addresses are valid.</p> <h4 id="safety"><a class="zola-anchor" href="#safety" aria-label="Anchor link for: safety">🔗</a>Safety</h4> <p>But can we really guarantee that handler addresses are always valid? Let’s see:</p> <ul> <li>The <code>Idt::new</code> function creates a new table populated with non-present entries. There’s no way to set these entries to present from outside of this module, so this function is fine.</li> <li>The <code>set_handler</code> method allows us to overwrite a specified entry and point it to some handler function. Rust’s type system guarantees that function pointers are always valid (as long as no <code>unsafe</code> is involved), so this function is fine, too.</li> </ul> <p>There are no other public functions in the <code>idt</code> module (except <code>load</code>), so it should be safe… right?</p> <p>Wrong! Imagine the following scenario:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> load_idt(); </span><span> cause_page_fault(); </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>load_idt() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> idt.set_handler(</span><span style="color:#b5cea8;">14</span><span>, page_fault_handler); </span><span> idt.load(); </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>cause_page_fault() { </span><span> </span><span style="color:#569cd6;">let</span><span> x = [</span><span style="color:#b5cea8;">1</span><span>,</span><span style="color:#b5cea8;">2</span><span>,</span><span style="color:#b5cea8;">3</span><span>,</span><span style="color:#b5cea8;">4</span><span>,</span><span style="color:#b5cea8;">5</span><span>,</span><span style="color:#b5cea8;">6</span><span>,</span><span style="color:#b5cea8;">7</span><span>,</span><span style="color:#b5cea8;">8</span><span>,</span><span style="color:#b5cea8;">9</span><span>]; </span><span> </span><span style="color:#569cd6;">unsafe</span><span>{ *(</span><span style="color:#b5cea8;">0xdeadbeaf </span><span style="color:#569cd6;">as *mut u64</span><span>) = x[</span><span style="color:#b5cea8;">4</span><span>] }; </span><span>} </span></code></pre> <p>This won’t work. If we’re lucky, we get a triple fault and a boot loop. If we’re unlucky, our kernel does strange things and fails at some completely unrelated place. So what’s the problem here?</p> <p>Well, we construct an IDT <em>on the stack</em> and load it. It is perfectly valid until the end of the <code>load_idt</code> function. But as soon as the function returns, its stack frame can be reused by other functions. Thus, the IDT gets overwritten by the stack frame of the <code>cause_page_fault</code> function. So when the page fault occurs and the CPU tries to read the entry, it only sees some garbage values and issues a double fault, which escalates to a triple fault and a CPU reset.</p> <p>Now imagine that the <code>cause_page_fault</code> function declared an array of pointers instead. If the present was coincidentally set, the CPU would jump to some random pointer and interpret random memory as code. This would be a clear violation of memory safety.</p> <h4 id="fixing-the-load-method"><a class="zola-anchor" href="#fixing-the-load-method" aria-label="Anchor link for: fixing-the-load-method">🔗</a>Fixing the load method</h4> <p>So how do we fix it? We could make the load function itself <code>unsafe</code> and push the unsafety to the caller. However, there is a much better solution in this case. In order to see it, we formulate the requirement for the <code>load</code> method:</p> <blockquote> <p>The referenced IDT must be valid until a new IDT is loaded.</p> </blockquote> <p>We can’t know when the next IDT will be loaded. Maybe never. So in the worst case:</p> <blockquote> <p>The referenced IDT must be valid as long as our kernel runs.</p> </blockquote> <p>This is exactly the definition of a <a href="https://doc.rust-lang.org/rust-by-example/scope/lifetime/static_lifetime.html">static lifetime</a>. So we can easily ensure that the IDT lives long enough by adding a <code>'static</code> requirement to the signature of the <code>load</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>load(</span><span style="color:#569cd6;">&amp;&#39;static </span><span>self) {</span><span style="color:#569cd6;">...</span><span>} </span><span style="color:#608b4e;">// ^^^^^^^ ensure that the IDT reference has the &#39;static lifetime </span></code></pre> <p>That’s it! Now the Rust compiler ensures that the above error can’t happen anymore:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: `idt` does not live long enough </span><span> --&gt; src/interrupts/mod.rs:78:5 </span><span>78 |&gt; idt.load(); </span><span> |&gt; ^^^ </span><span>note: reference must be valid for the static lifetime... </span><span>note: ...but borrowed value is only valid for the block suffix following </span><span> statement 0 at 75:34 </span><span> --&gt; src/interrupts/mod.rs:75:35 </span><span>75 |&gt; let mut idt = idt::Idt::new(); </span><span> |&gt; ^ </span></code></pre> <h3 id="a-static-idt"><a class="zola-anchor" href="#a-static-idt" aria-label="Anchor link for: a-static-idt">🔗</a>A static IDT</h3> <p>So a valid IDT needs to have the <code>'static</code> lifetime. We can either create a <code>static</code> IDT or <a href="https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.into_raw">deliberately leak a Box</a>. We will most likely only need a single IDT for the foreseeable future, so let’s try the <code>static</code> approach:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> </span><span> idt.set_handler(</span><span style="color:#b5cea8;">0</span><span>, divide_by_zero_handler); </span><span> </span><span> idt </span><span>}; </span><span> </span><span style="color:#569cd6;">extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>divide_by_zero_handler() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;EXCEPTION: DIVIDE BY ZERO&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>We register a single handler function for a <a href="https://wiki.osdev.org/Exceptions#Division_Error">divide by zero error</a> (index 0). Like the name says, this exception occurs when dividing a number by 0. Thus we have an easy way to test our new exception handler.</p> <p>However, it doesn’t work this way:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: calls in statics are limited to constant functions, struct and enum </span><span> constructors [E0015] </span><span>... </span><span>error: blocks in statics are limited to items and tail expressions [E0016] </span><span>... </span><span>error: references in statics may only refer to immutable values [E0017] </span><span>... </span></code></pre> <p>The reason is that the Rust compiler is not able to evaluate the value of the <code>static</code> at compile time. Maybe it will work someday when <code>const</code> functions become more powerful. But until then, we have to find another solution.</p> <h4 id="lazy-statics-to-the-rescue"><a class="zola-anchor" href="#lazy-statics-to-the-rescue" aria-label="Anchor link for: lazy-statics-to-the-rescue">🔗</a>Lazy Statics to the Rescue</h4> <p>Fortunately the <code>lazy_static</code> macro exists. Instead of evaluating a <code>static</code> at compile time, the macro performs the initialization when the <code>static</code> is referenced the first time. Thus, we can do almost everything in the initialization block and are even able to read runtime values.</p> <p>Let’s add the <code>lazy_static</code> crate to our project:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[macro_use] </span><span style="color:#569cd6;">extern crate</span><span> lazy_static; </span></code></pre> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies.lazy_static</span><span>] </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.2.1&quot; </span><span style="color:#569cd6;">features </span><span>= [</span><span style="color:#d69d85;">&quot;spin_no_std&quot;</span><span>] </span></code></pre> <p>We need the <code>spin_no_std</code> feature, since we don’t link the standard library.</p> <p>With <code>lazy_static</code>, we can define our IDT without problems:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span>lazy_static! { </span><span> </span><span style="color:#569cd6;">static ref </span><span style="color:#b4cea8;">IDT</span><span>: idt::Idt = { </span><span> </span><span style="color:#569cd6;">let mut</span><span> idt = idt::Idt::new(); </span><span> </span><span> idt.set_handler(</span><span style="color:#b5cea8;">0</span><span>, divide_by_zero_handler); </span><span> </span><span> idt </span><span> }; </span><span>} </span></code></pre> <p>Now we’re ready to load our IDT! Therefore we add a <code>interrupts::init</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/interrupts/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init() { </span><span> </span><span style="color:#b4cea8;">IDT</span><span>.load(); </span><span>} </span></code></pre> <p>We don’t need our <code>assert_has_not_been_called</code> macro here, since nothing bad happens when <code>init</code> is called twice. It just reloads the same IDT again.</p> <h2 id="testing-it"><a class="zola-anchor" href="#testing-it" aria-label="Anchor link for: testing-it">🔗</a>Testing it</h2> <p>Now we should be able to catch page faults! Let’s try it in our <code>rust_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(...) { </span><span> </span><span style="color:#569cd6;">... </span><span> memory::init(boot_info); </span><span> </span><span> </span><span style="color:#608b4e;">// initialize our IDT </span><span> interrupts::init(); </span><span> </span><span> </span><span style="color:#608b4e;">// provoke a divide-by-zero fault </span><span> </span><span style="color:#b5cea8;">42 </span><span>/ </span><span style="color:#b5cea8;">0</span><span>; </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>When we run it, we get a runtime panic:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>PANIC in src/lib.rs at line 57: </span><span> attempted to divide by zero </span></code></pre> <p>That’s a not our exception handler. The reason is that Rust itself checks for a possible division by zero and panics in that case. So in order to raise a divide-by-zero error in the CPU, we need to bypass the Rust compiler somehow.</p> <h3 id="inline-assembly"><a class="zola-anchor" href="#inline-assembly" aria-label="Anchor link for: inline-assembly">🔗</a>Inline Assembly</h3> <p>In order to cause a divide-by-zero exception, we need to execute a <a href="https://www.felixcloutier.com/x86/div">div</a> or <a href="https://www.felixcloutier.com/x86/idiv">idiv</a> assembly instruction with operand 0. We could write a small assembly function and call it from our Rust code. An easier way is to use Rust’s <a href="https://doc.rust-lang.org/1.10.0/book/inline-assembly.html">inline assembly</a> macro.</p> <p>Inline assembly allows us to write raw x86 assembly within a Rust function. The feature is unstable, so we need to add <code>#![feature(asm)]</code> to our <code>src/lib.rs</code>. Then we’re able to write a <code>divide_by_zero</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>divide_by_zero() { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> asm!(</span><span style="color:#d69d85;">&quot;mov dx, 0; div dx&quot; </span><span>::: </span><span style="color:#d69d85;">&quot;ax&quot;</span><span>, </span><span style="color:#d69d85;">&quot;dx&quot; </span><span>: </span><span style="color:#d69d85;">&quot;volatile&quot;</span><span>, </span><span style="color:#d69d85;">&quot;intel&quot;</span><span>) </span><span> } </span><span>} </span></code></pre> <p>Let’s try to decode it:</p> <ul> <li>The <code>asm!</code> macro emits raw assembly instructions, so it’s <code>unsafe</code> to use it.</li> <li>We insert two assembly instructions here: <code>mov dx, 0</code> and <code>div dx</code>. The former loads a 0 into the <code>dx</code> register (a subset of <code>rdx</code>) and the latter divides the <code>ax</code> register by <code>dx</code>. (The <code>div</code> instruction always implicitly operates on the <code>ax</code> register).</li> <li>The colons are separators. After the first <code>:</code> we could specify output operands and after the second <code>:</code> we could specify input operands. We need neither, so we leave these areas empty.</li> <li>After the third colon, we specify the so-called <em>clobbers</em>. These tell the compiler that our assembly modifies the values of some registers. Otherwise, the compiler assumes that the registers preserve their value. In our case, we clobber <code>dx</code> (we load 0 to it) and <code>ax</code> (the <code>div</code> instruction places the result in it).</li> <li>The last block (after the 4th colon) specifies some options. The <code>volatile</code> option tells the compiler: “This code has side effects. Do not delete it and do not move it elsewhere”. In our case, the “side effect” is the divide-by-zero exception. Finally, the <code>intel</code> option allows us to use the Intel assembly syntax instead of the default AT&amp;T syntax.</li> </ul> <p>Let’s use our new <code>divide_by_zero</code> function to raise a CPU exception:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(...) { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> </span><span style="color:#608b4e;">// provoke a divide-by-zero fault </span><span> divide_by_zero(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>It works! We see a <code>EXCEPTION: DIVIDE BY ZERO</code> message at the bottom of our screen:</p> <p><img src="https://os.phil-opp.com/catching-exceptions/qemu-divide-error-println.png" alt="QEMU screenshot with EXCEPTION: DIVIDE BY ZERO message" /></p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>We’ve successfully caught our first exception! However, our <code>EXCEPTION: DIVIDE BY ZERO</code> message doesn’t contain much information about the cause of the exception. The next post improves the situation by printing i.a. the current stack pointer and address of the causing instruction. We will also explore other exceptions such as page faults, for which the CPU pushes an <em>error code</em> on the stack.</p> Kernel Heap Mon, 11 Apr 2016 00:00:00 +0000 https://os.phil-opp.com/kernel-heap/ https://os.phil-opp.com/kernel-heap/ <p>In the previous posts we created a <a href="https://os.phil-opp.com/allocating-frames/">frame allocator</a> and a <a href="https://os.phil-opp.com/page-tables/">page table module</a>. Now we are ready to create a kernel heap and a memory allocator. Thus, we will unlock <code>Box</code>, <code>Vec</code>, <code>BTreeMap</code>, and the rest of the <a href="https://doc.rust-lang.org/nightly/alloc/index.html">alloc</a> crate.</p> <span id="continue-reading"></span> <p>As always, you can find the complete source code on <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_8">GitHub</a>. Please file <a href="https://github.com/phil-opp/blog_os/issues">issues</a> for any problems, questions, or improvement suggestions. There is also a comment section at the end of this page.</p> <h2 id="introduction"><a class="zola-anchor" href="#introduction" aria-label="Anchor link for: introduction">🔗</a>Introduction</h2> <p>The <em>heap</em> is the memory area for long-lived allocations. The programmer can access it by using types like <a href="https://doc.rust-lang.org/rust-by-example/std/box.html">Box</a> or <a href="https://doc.rust-lang.org/book/vectors.html">Vec</a>. Behind the scenes, the compiler manages that memory by inserting calls to some memory allocator. By default, Rust links to the <a href="http://jemalloc.net/">jemalloc</a> allocator (for binaries) or the system allocator (for libraries). However, both rely on <a href="https://en.wikipedia.org/wiki/System_call">system calls</a> such as <a href="https://en.wikipedia.org/wiki/Sbrk">sbrk</a> and are thus unusable in our kernel. So we need to create and link our own allocator.</p> <p>A good allocator is fast and reliable. It also effectively utilizes the available memory and keeps <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)">fragmentation</a> low. Furthermore, it works well for concurrent applications and scales to any number of processors. It even optimizes the memory layout with respect to the CPU caches to improve <a href="https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/">cache locality</a> and avoid <a href="https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html">false sharing</a>.</p> <p>These requirements make good allocators pretty complex. For example, <a href="http://jemalloc.net/">jemalloc</a> has over 30.000 lines of code. This complexity is out of scope for our kernel, so we will create a much simpler allocator. Nevertheless, it should suffice for the foreseeable future, since we’ll allocate only when it’s absolutely necessary.</p> <h2 id="the-allocator-interface"><a class="zola-anchor" href="#the-allocator-interface" aria-label="Anchor link for: the-allocator-interface">🔗</a>The Allocator Interface</h2> <p>The allocator interface in Rust is defined through the <a href="https://doc.rust-lang.org/1.20.0/alloc/allocator/trait.Alloc.html"><code>Alloc</code> trait</a>, which looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub unsafe trait </span><span>Alloc { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, layout: Layout) -&gt; Result&lt;</span><span style="color:#569cd6;">*mut u8</span><span>, AllocErr&gt;; </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout); </span><span> […] </span><span style="color:#608b4e;">// about 13 methods with default implementations </span><span>} </span></code></pre> <p>The <code>alloc</code> method should allocate a memory block with the size and alignment given through <code>Layout</code> parameter. The <code>deallocate</code> method should free such memory blocks again. Both methods are <code>unsafe</code>, as is the trait itself. This has different reasons:</p> <ul> <li>Implementing the <code>Alloc</code> trait is unsafe, because the implementation must satisfy a set of contracts. Among other things, pointers returned by <code>alloc</code> must point to valid memory and adhere to the <code>Layout</code> requirements.</li> <li>Calling <code>alloc</code> is unsafe because the caller must ensure that the passed layout does not have size zero. I think this is because of compatibility reasons with existing C-allocators, where zero-sized allocations are undefined behavior.</li> <li>Calling <code>dealloc</code> is unsafe because the caller must guarantee that the passed parameters adhere to the contract. For example, <code>ptr</code> must denote a valid memory block allocated via this allocator.</li> </ul> <p>To set the system allocator, the <code>global_allocator</code> attribute can be added to a <code>static</code> that implements <code>Alloc</code> for a shared reference of itself. For example:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">MY_ALLOCATOR</span><span>: MyAllocator = MyAllocator {</span><span style="color:#569cd6;">...</span><span>}; </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;</span><span style="color:#569cd6;">&#39;a</span><span>&gt; Alloc </span><span style="color:#569cd6;">for &amp;&#39;a </span><span>MyAllocator { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, layout: Layout) -&gt; Result&lt;</span><span style="color:#569cd6;">*mut u8</span><span>, AllocErr&gt; {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout) {</span><span style="color:#569cd6;">...</span><span>} </span><span>} </span></code></pre> <p>Note that <code>Alloc</code> needs to be implemented for <code>&amp;MyAllocator</code>, not for <code>MyAllocator</code>. The reason is that the <code>alloc</code> and <code>dealloc</code> methods require mutable <code>self</code> references, but there’s no way to get such a reference safely from a <code>static</code>. By requiring implementations for <code>&amp;MyAllocator</code>, the global allocator interface avoids this problem and pushes the burden of synchronization onto the user.</p> <h2 id="including-the-alloc-crate"><a class="zola-anchor" href="#including-the-alloc-crate" aria-label="Anchor link for: including-the-alloc-crate">🔗</a>Including the alloc crate</h2> <p>The <code>Alloc</code> trait is part of the <code>alloc</code> crate, which like <code>core</code> is a subset of Rust’s standard library. Apart from the trait, the crate also contains the standard types that require allocations such as <code>Box</code>, <code>Vec</code> and <code>Arc</code>. We can include it through a simple <code>extern crate</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span>#![feature(alloc)] </span><span style="color:#608b4e;">// the alloc crate is still unstable </span><span> </span><span>[</span><span style="color:#569cd6;">...</span><span>] </span><span> </span><span>#[macro_use] </span><span style="color:#569cd6;">extern crate</span><span> alloc; </span></code></pre> <p>We don’t need to add anything to our Cargo.toml, since the <code>alloc</code> crate is part of the standard library and shipped with the Rust compiler. The <code>alloc</code> crate provides the <a href="https://doc.rust-lang.org/1.10.0/collections/macro.format!.html">format!</a> and <a href="https://doc.rust-lang.org/1.10.0/collections/macro.vec!.html">vec!</a> macros, so we use <code>#[macro_use]</code> to import them.</p> <p>When we try to compile our crate now, the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0463]: can&#39;t find crate for `alloc` </span><span> --&gt; src/lib.rs:10:1 </span><span> | </span><span>16 | extern crate alloc; </span><span> | ^^^^^^^^^^^^^^^^^^^ can&#39;t find crate </span></code></pre> <p>The problem is that <a href="https://github.com/japaric/xargo"><code>xargo</code></a> only cross compiles <code>libcore</code> by default. To also cross compile the <code>alloc</code> crate, we need to create a file named <code>Xargo.toml</code> in our project root (right next to the <code>Cargo.toml</code>) with the following content:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">target.x86_64-blog_os.dependencies</span><span>] </span><span style="color:#569cd6;">alloc </span><span>= {} </span></code></pre> <p>This instructs <code>xargo</code> that we also need <code>alloc</code>. It still doesn’t compile, since we need to define a global allocator in order to use the <code>alloc</code> crate:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: no #[default_lib_allocator] found but one is required; is libstd not linked? </span></code></pre> <h2 id="a-bump-allocator"><a class="zola-anchor" href="#a-bump-allocator" aria-label="Anchor link for: a-bump-allocator">🔗</a>A Bump Allocator</h2> <p>For our first allocator, we start simple. We create a <code>memory::heap_allocator</code> module containing a so-called <em>bump allocator</em>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">mod </span><span>heap_allocator; </span><span> </span><span style="color:#608b4e;">// in src/memory/heap_allocator.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::heap::{Alloc, AllocErr, Layout}; </span><span> </span><span style="color:#608b4e;">/// A simple allocator that allocates memory linearly and ignores freed memory. </span><span>#[derive(Debug)] </span><span style="color:#569cd6;">pub struct </span><span>BumpAllocator { </span><span> heap_start: </span><span style="color:#569cd6;">usize</span><span>, </span><span> heap_end: </span><span style="color:#569cd6;">usize</span><span>, </span><span> next: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>BumpAllocator { </span><span> </span><span style="color:#569cd6;">pub const fn </span><span>new(heap_start: </span><span style="color:#569cd6;">usize</span><span>, heap_end: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#569cd6;">Self </span><span>{ heap_start, heap_end, next: heap_start } </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">unsafe impl </span><span>Alloc </span><span style="color:#569cd6;">for </span><span>BumpAllocator { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, layout: Layout) -&gt; Result&lt;</span><span style="color:#569cd6;">*mut u8</span><span>, AllocErr&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_start = align_up(self.next, layout.align()); </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_end = alloc_start.saturating_add(layout.size()); </span><span> </span><span> </span><span style="color:#569cd6;">if</span><span> alloc_end &lt;= self.heap_end { </span><span> self.next = alloc_end; </span><span> Ok(alloc_start </span><span style="color:#569cd6;">as *mut u8</span><span>) </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> Err(AllocErr::Exhausted{ request: layout }) </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout) { </span><span> </span><span style="color:#608b4e;">// do nothing, leak memory </span><span> } </span><span>} </span></code></pre> <p>We also need to add <code>#![feature(allocator_api)]</code> to our <code>lib.rs</code>, since the allocator API is still unstable.</p> <p>The <code>heap_start</code> and <code>heap_end</code> fields contain the start and end address of our kernel heap. The <code>next</code> field contains the next free address and is increased after every allocation. To <code>allocate</code> a memory block we align the <code>next</code> address using the <code>align_up</code> function (described below). Then we add up the desired <code>size</code> and make sure that we don’t exceed the end of the heap. We use a saturating add so that the <code>alloc_end</code> cannot overflow, which could lead to an invalid allocation. If everything goes well, we update the <code>next</code> address and return a pointer to the start address of the allocation. Else, we return <code>None</code>.</p> <h3 id="alignment"><a class="zola-anchor" href="#alignment" aria-label="Anchor link for: alignment">🔗</a>Alignment</h3> <p>In order to simplify alignment, we add <code>align_down</code> and <code>align_up</code> functions:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">/// Align downwards. Returns the greatest x with alignment `align` </span><span style="color:#608b4e;">/// so that x &lt;= addr. The alignment must be a power of 2. </span><span style="color:#569cd6;">pub fn </span><span>align_down(addr: </span><span style="color:#569cd6;">usize</span><span>, align: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> </span><span style="color:#569cd6;">if</span><span> align.is_power_of_two() { </span><span> addr </span><span style="color:#569cd6;">&amp; !</span><span>(align - </span><span style="color:#b5cea8;">1</span><span>) </span><span> } </span><span style="color:#569cd6;">else if</span><span> align == </span><span style="color:#b5cea8;">0 </span><span>{ </span><span> addr </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> panic!(</span><span style="color:#d69d85;">&quot;`align` must be a power of 2&quot;</span><span>); </span><span> } </span><span>} </span><span> </span><span style="color:#608b4e;">/// Align upwards. Returns the smallest x with alignment `align` </span><span style="color:#608b4e;">/// so that x &gt;= addr. The alignment must be a power of 2. </span><span style="color:#569cd6;">pub fn </span><span>align_up(addr: </span><span style="color:#569cd6;">usize</span><span>, align: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> align_down(addr + align - </span><span style="color:#b5cea8;">1</span><span>, align) </span><span>} </span></code></pre> <p>Let’s start with <code>align_down</code>: If the alignment is a valid power of two (i.e. in <code>{1,2,4,8,…}</code>), we use some bitwise operations to return the aligned address. It works because every power of two has exactly one bit set in its binary representation. For example, the numbers <code>{1,2,4,8,…}</code> are <code>{1,10,100,1000,…}</code> in binary. By subtracting 1 we get <code>{0,01,011,0111,…}</code>. These binary numbers have a <code>1</code> at exactly the positions that need to be zeroed in <code>addr</code>. For example, the last 3 bits need to be zeroed for a alignment of 8.</p> <p>To align <code>addr</code>, we create a <a href="https://en.wikipedia.org/wiki/Mask_(computing)">bitmask</a> from <code>align-1</code>. We want a <code>0</code> at the position of each <code>1</code>, so we invert it using <code>!</code>. After that, the binary numbers look like this: <code>{…11111,…11110,…11100,…11000,…}</code>. Finally, we zero the correct bits using a binary <code>AND</code>.</p> <p>Aligning upwards is simple now. We just increase <code>addr</code> by <code>align-1</code> and call <code>align_down</code>. We add <code>align-1</code> instead of <code>align</code> because we would otherwise waste <code>align</code> bytes for already aligned addresses.</p> <h3 id="reusing-freed-memory"><a class="zola-anchor" href="#reusing-freed-memory" aria-label="Anchor link for: reusing-freed-memory">🔗</a>Reusing Freed Memory</h3> <p>The heap memory is limited, so we should reuse freed memory for new allocations. This sounds simple, but is not so easy in practice since allocations can live arbitrarily long (and can be freed in an arbitrary order). This means that we need some kind of data structure to keep track of which memory areas are free and which are in use. This data structure should be very optimized since it causes overheads in both space (i.e. it needs backing memory) and time (i.e. accessing and organizing it needs CPU cycles).</p> <p>Our bump allocator only keeps track of the next free memory address, which doesn’t suffice to keep track of freed memory areas. So our only choice is to ignore deallocations and leak the corresponding memory. Thus our allocator quickly runs out of memory in a real system, but it suffices for simple testing. Later in this post, we will introduce a better allocator that does not leak freed memory.</p> <h3 id="using-it-as-system-allocator"><a class="zola-anchor" href="#using-it-as-system-allocator" aria-label="Anchor link for: using-it-as-system-allocator">🔗</a>Using it as System Allocator</h3> <p>Above we saw that we can use a static allocator as system allocator through the <code>global_allocator</code> attribute:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">ALLOCATOR</span><span>: MyAllocator = MyAllocator {</span><span style="color:#569cd6;">...</span><span>}; </span></code></pre> <p>This requires an implementation of <code>Alloc</code> for <code>&amp;MyAllocator</code>, i.e. a shared reference. If we try to add such an implementation for our bump allocator (<code>unsafe impl&lt;'a&gt; Alloc for &amp;'a BumpAllocator</code>), we have a problem: Our <code>alloc</code> method requires updating the <code>next</code> field, which is not possible for a shared reference.</p> <p>One solution could be to put the bump allocator behind a Mutex and wrap it into a new type, for which we can implement <code>Alloc</code> for a shared reference:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">struct </span><span>LockedBumpAllocator(Mutex&lt;BumpAllocator&gt;); </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;</span><span style="color:#569cd6;">&#39;a</span><span>&gt; Alloc </span><span style="color:#569cd6;">for &amp;&#39;a </span><span>LockedBumpAllocator { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, layout: Layout) -&gt; Result&lt;</span><span style="color:#569cd6;">*mut u8</span><span>, AllocErr&gt; { </span><span> self.</span><span style="color:#b5cea8;">0.</span><span>lock().alloc(layout) </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout) { </span><span> self.</span><span style="color:#b5cea8;">0.</span><span>lock().dealloc(ptr, layout) </span><span> } </span><span>} </span></code></pre> <p>However, there is a more interesting solution for our bump allocator that avoids locking altogether. The idea is to exploit that we only need to update a single <code>usize</code> field byusing an <code>AtomicUsize</code> type. This type uses special synchronized hardware instructions to ensure data race freedom without requiring locks.</p> <h4 id="a-lock-free-bump-allocator"><a class="zola-anchor" href="#a-lock-free-bump-allocator" aria-label="Anchor link for: a-lock-free-bump-allocator">🔗</a>A lock-free Bump Allocator</h4> <p>A lock-free implementation looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>core::sync::atomic::{AtomicUsize, Ordering}; </span><span> </span><span style="color:#608b4e;">/// A simple allocator that allocates memory linearly and ignores freed memory. </span><span>#[derive(Debug)] </span><span style="color:#569cd6;">pub struct </span><span>BumpAllocator { </span><span> heap_start: </span><span style="color:#569cd6;">usize</span><span>, </span><span> heap_end: </span><span style="color:#569cd6;">usize</span><span>, </span><span> next: AtomicUsize, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>BumpAllocator { </span><span> </span><span style="color:#569cd6;">pub const fn </span><span>new(heap_start: </span><span style="color:#569cd6;">usize</span><span>, heap_end: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">Self </span><span>{ </span><span> </span><span style="color:#608b4e;">// NOTE: requires adding #![feature(const_atomic_usize_new)] to lib.rs </span><span> </span><span style="color:#569cd6;">Self </span><span>{ heap_start, heap_end, next: AtomicUsize::new(heap_start) } </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">unsafe impl</span><span>&lt;</span><span style="color:#569cd6;">&#39;a</span><span>&gt; Alloc </span><span style="color:#569cd6;">for &amp;&#39;a </span><span>BumpAllocator { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>alloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, layout: Layout) -&gt; Result&lt;</span><span style="color:#569cd6;">*mut u8</span><span>, AllocErr&gt; { </span><span> </span><span style="color:#569cd6;">loop </span><span>{ </span><span> </span><span style="color:#608b4e;">// load current state of the `next` field </span><span> </span><span style="color:#569cd6;">let</span><span> current_next = self.next.load(Ordering::Relaxed); </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_start = align_up(current_next, layout.align()); </span><span> </span><span style="color:#569cd6;">let</span><span> alloc_end = alloc_start.saturating_add(layout.size()); </span><span> </span><span> </span><span style="color:#569cd6;">if</span><span> alloc_end &lt;= self.heap_end { </span><span> </span><span style="color:#608b4e;">// update the `next` pointer if it still has the value `current_next` </span><span> </span><span style="color:#569cd6;">let</span><span> next_now = self.next.compare_and_swap(current_next, alloc_end, </span><span> Ordering::Relaxed); </span><span> </span><span style="color:#569cd6;">if</span><span> next_now == current_next { </span><span> </span><span style="color:#608b4e;">// next address was successfully updated, allocation succeeded </span><span> </span><span style="color:#569cd6;">return </span><span>Ok(alloc_start </span><span style="color:#569cd6;">as *mut u8</span><span>); </span><span> } </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> </span><span style="color:#569cd6;">return </span><span>Err(AllocErr::Exhausted{ request: layout }) </span><span> } </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>dealloc(</span><span style="color:#569cd6;">&amp;mut </span><span>self, ptr: </span><span style="color:#569cd6;">*mut u8</span><span>, layout: Layout) { </span><span> </span><span style="color:#608b4e;">// do nothing, leak memory </span><span> } </span><span>} </span></code></pre> <p>The implementation is a bit more complicated now. First, there is now a <code>loop</code> around the whole method body, since we might need multiple tries until we succeed (e.g. if multiple threads try to allocate at the same time). Also, the loads operation is an explicit method call now, i.e. <code>self.next.load(Ordering::Relaxed)</code> instead of just <code>self.next</code>. The ordering parameter makes it possible to restrict the automatic instruction reordering performed by both the compiler and the CPU itself. For example, it is used when implementing locks to ensure that no write to the locked variable happens before the lock is acquired. We don’t have such requirements, so we use the less restrictive <code>Relaxed</code> ordering.</p> <p>The heart of this lock-free method is the <code>compare_and_swap</code> call that updates the <code>next</code> address:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">... </span><span style="color:#569cd6;">let</span><span> next_now = self.next.compare_and_swap(current_next, alloc_end, </span><span> Ordering::Relaxed); </span><span style="color:#569cd6;">if</span><span> next_now == current_next { </span><span> </span><span style="color:#608b4e;">// next address was successfully updated, allocation succeeded </span><span> </span><span style="color:#569cd6;">return </span><span>Ok(alloc_start </span><span style="color:#569cd6;">as *mut u8</span><span>); </span><span>} </span><span style="color:#569cd6;">... </span></code></pre> <p>Compare-and-swap is a special CPU instruction that updates a variable with a given value if it still contains the value we expect. If it doesn’t, it means that another thread updated the value simultaneously, so we need to try again. The important feature is that this happens in a single uninteruptible operation (thus the name <code>atomic</code>), so no partial updates or intermediate states are possible.</p> <p>In detail, <code>compare_and_swap</code> works by comparing <code>next</code> with the first argument and, in case they’re equal, updates <code>next</code> with the second parameter (the previous value is returned). To find out whether a switch happened, we check the returned previous value of <code>next</code>. If it is equal to the first parameter, the values were swapped. Otherwise, we try again in the next loop iteration.</p> <h4 id="setting-the-global-allocator"><a class="zola-anchor" href="#setting-the-global-allocator" aria-label="Anchor link for: setting-the-global-allocator">🔗</a>Setting the Global Allocator</h4> <p>Now we can define a static bump allocator, that we can set as system allocator:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">HEAP_START</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">0o_000_001_000_000_0000</span><span>; </span><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">HEAP_SIZE</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">100 </span><span>* </span><span style="color:#b5cea8;">1024</span><span>; </span><span style="color:#608b4e;">// 100 KiB </span><span> </span><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">HEAP_ALLOCATOR</span><span>: BumpAllocator = BumpAllocator::new(</span><span style="color:#b4cea8;">HEAP_START</span><span>, </span><span> </span><span style="color:#b4cea8;">HEAP_START </span><span>+ </span><span style="color:#b4cea8;">HEAP_SIZE</span><span>); </span></code></pre> <p>We use <code>0o_000_001_000_000_0000</code> as heap start address, which is the address starting at the second <code>P3</code> entry. It doesn’t really matter which address we choose here as long as it’s unused. We use a heap size of 100 KiB, which should be large enough for the near future.</p> <p>Putting the above in the <code>memory::heap_allocator</code> module would make most sense, but unfortunately there is currently a <a href="https://github.com/rust-lang/rust/issues/44113">weird bug</a> in the global allocator implementation that requires putting the global allocator in the root module. I hope it’s fixed soon, but until then we need to put the above lines in <code>src/lib.rs</code>. For that, we need to make the <code>memory::heap_allocator</code> module public and add an import for <code>BumpAllocator</code>. We also need to add the <code>#![feature(global_allocator)]</code> at the top of our <code>lib.rs</code>, since the <code>global_allocator</code> attribute is still unstable.</p> <p>That’s it! We have successfully created and linked a custom system allocator. Now we’re ready to test it.</p> <h3 id="testing"><a class="zola-anchor" href="#testing" aria-label="Anchor link for: testing">🔗</a>Testing</h3> <p>We should be able to allocate memory on the heap now. Let’s try it in our <code>rust_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in rust_main in src/lib.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::boxed::Box; </span><span style="color:#569cd6;">let</span><span> heap_test = Box::new(</span><span style="color:#b5cea8;">42</span><span>); </span></code></pre> <p>When we run it, a triple fault occurs and causes permanent rebooting. Let’s try debug it using QEMU and objdump as described <a href="https://os.phil-opp.com/remap-the-kernel/#debugging">in the previous post</a>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; qemu-system-x86_64 -d int -no-reboot -cdrom build/os-x86_64.iso </span><span>… </span><span>check_exception old: 0xffffffff new 0xe </span><span> 0: v=0e e=0002 i=0 cpl=0 IP=0008:0000000000102860 pc=0000000000102860 </span><span> SP=0010:0000000000116af0 CR2=0000000040000000 </span><span>… </span></code></pre> <p>Aha! It’s a <a href="https://wiki.osdev.org/Exceptions#Page_Fault">page fault</a> (<code>v=0e</code>) and was caused by the code at <code>0x102860</code>. The code tried to write (<code>e=0002</code>) to address <code>0x40000000</code>. This address is <code>0o_000_001_000_000_0000</code> in octal, which is the <code>HEAP_START</code> address defined above. Of course it page-faults: We have forgotten to map the heap memory to some physical memory.</p> <h3 id="some-refactoring"><a class="zola-anchor" href="#some-refactoring" aria-label="Anchor link for: some-refactoring">🔗</a>Some Refactoring</h3> <p>In order to map the heap cleanly, we do a bit of refactoring first. We move all memory initialization from our <code>rust_main</code> to a new <code>memory::init</code> function. Now our <code>rust_main</code> looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#608b4e;">// ATTENTION: we have a very small stack and no guard page </span><span> vga_buffer::clear_screen(); </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> boot_info = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> multiboot2::load(multiboot_information_address) </span><span> }; </span><span> enable_nxe_bit(); </span><span> enable_write_protect_bit(); </span><span> </span><span> </span><span style="color:#608b4e;">// set up guard page and map the heap pages </span><span> memory::init(boot_info); </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::boxed::Box; </span><span> </span><span style="color:#569cd6;">let</span><span> heap_test = Box::new(</span><span style="color:#b5cea8;">42</span><span>); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>The <code>memory::init</code> function looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">use </span><span>multiboot2::BootInformation; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init(boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) { </span><span> </span><span style="color:#569cd6;">let</span><span> memory_map_tag = boot_info.memory_map_tag().expect( </span><span> </span><span style="color:#d69d85;">&quot;Memory map tag required&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> elf_sections_tag = boot_info.elf_sections_tag().expect( </span><span> </span><span style="color:#d69d85;">&quot;Elf sections tag required&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> kernel_start = elf_sections_tag.sections() </span><span> .filter(|s| s.is_allocated()).map(|s| s.addr).min().unwrap(); </span><span> </span><span style="color:#569cd6;">let</span><span> kernel_end = elf_sections_tag.sections() </span><span> .filter(|s| s.is_allocated()).map(|s| s.addr + s.size).max() </span><span> .unwrap(); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;kernel start: </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">, kernel end: </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> kernel_start, </span><span> kernel_end); </span><span> println!(</span><span style="color:#d69d85;">&quot;multiboot start: </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">, multiboot end: </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> boot_info.start_address(), </span><span> boot_info.end_address()); </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = AreaFrameAllocator::new( </span><span> kernel_start </span><span style="color:#569cd6;">as usize</span><span>, kernel_end </span><span style="color:#569cd6;">as usize</span><span>, </span><span> boot_info.start_address(), boot_info.end_address(), </span><span> memory_map_tag.memory_areas()); </span><span> </span><span> paging::remap_the_kernel(</span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator, boot_info); </span><span>} </span></code></pre> <p>We’ve just moved the code to a new function. However, we’ve sneaked some improvements in:</p> <ul> <li>An additional <code>.filter(|s| s.is_allocated())</code> in the calculation of <code>kernel_start</code> and <code>kernel_end</code>. This ignores all sections that aren’t loaded to memory (such as debug sections). Thus, the kernel end address is no longer artificially increased by such sections.</li> <li>We use the <code>start_address()</code> and <code>end_address()</code> methods of <code>boot_info</code> instead of calculating the addresses manually.</li> <li>We use the alternate <code>{:#x}</code> form when printing kernel/multiboot addresses. Before, we used <code>0x{:x}</code>, which leads to the same result. For a complete list of these “alternate” formatting forms, check out the <a href="https://doc.rust-lang.org/nightly/std/fmt/index.html#sign0">std::fmt documentation</a>.</li> </ul> <h3 id="safety"><a class="zola-anchor" href="#safety" aria-label="Anchor link for: safety">🔗</a>Safety</h3> <p>It is important that the <code>memory::init</code> function is called only once, because it creates a new frame allocator based on kernel and multiboot start/end. When we call it a second time, a new frame allocator is created that reassigns the same frames, even if they are already in use.</p> <p>In the second call it would use an identical frame allocator to remap the kernel. The <code>remap_the_kernel</code> function would request a frame from the frame allocator to create a new page table. But the returned frame is already in use, since we used it to create our current page table in the first call. In order to initialize the new table, the function zeroes it. This is the point where everything breaks, since we zero our current page table. The CPU is unable to read the next instruction and throws a page fault.</p> <p>So we need to ensure that <code>memory::init</code> can be only called once. We could mark it as <code>unsafe</code>, which would bring it in line with Rust’s memory safety rules. However, that would just push the unsafety to the caller. The caller can still accidentally call the function twice, the only difference is that the mistake needs to happen inside <code>unsafe</code> blocks.</p> <p>A better solution is to insert a check at the function’s beginning, that panics if the function is called a second time. This approach has a small runtime cost, but we only call it once, so it’s negligible. And we avoid two <code>unsafe</code> blocks (one at the calling site and one at the function itself), which is always good.</p> <p>In order to make such checks easy, I created a small crate named <a href="https://crates.io/crates/once">once</a>. To add it, we run <code>cargo add once</code> and add the following to our <code>src/lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[macro_use] </span><span style="color:#569cd6;">extern crate</span><span> once; </span></code></pre> <p>The crate provides an <a href="https://docs.rs/once/0.3.2/once/macro.assert_has_not_been_called!.html">assert_has_not_been_called!</a> macro (sorry for the long name :D). We can use it to fix the safety problem easily:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init(boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) { </span><span> assert_has_not_been_called!(</span><span style="color:#d69d85;">&quot;memory::init must be called only once&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> memory_map_tag = </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">... </span><span>} </span></code></pre> <p>That’s it. Now our <code>memory::init</code> function can only be called once. The macro works by creating a static <a href="https://doc.rust-lang.org/nightly/core/sync/atomic/struct.AtomicBool.html">AtomicBool</a> named <code>CALLED</code>, which is initialized to <code>false</code>. When the macro is invoked, it checks the value of <code>CALLED</code> and sets it to <code>true</code>. If the value was already <code>true</code> before, the macro panics.</p> <h3 id="mapping-the-heap"><a class="zola-anchor" href="#mapping-the-heap" aria-label="Anchor link for: mapping-the-heap">🔗</a>Mapping the Heap</h3> <p>Now we’re ready to map the heap pages. In order to do it, we need access to the <code>ActivePageTable</code> or <code>Mapper</code> instance (see the <a href="https://os.phil-opp.com/page-tables/">page table</a> and <a href="https://os.phil-opp.com/remap-the-kernel/">kernel remapping</a> posts). For that we return it from the <code>paging::remap_the_kernel</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>remap_the_kernel&lt;A&gt;(allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A, boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) </span><span> -&gt; ActivePageTable </span><span style="color:#608b4e;">// new </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">... </span><span> println!(</span><span style="color:#d69d85;">&quot;guard page at </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">&quot;</span><span>, old_p4_page.start_address()); </span><span> </span><span> active_table </span><span style="color:#608b4e;">// new </span><span>} </span></code></pre> <p>Now we have full page table access in the <code>memory::init</code> function. This allows us to map the heap pages to physical frames:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>init(boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = </span><span style="color:#569cd6;">...</span><span>; </span><span> </span><span> </span><span style="color:#608b4e;">// below is the new part </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> active_table = paging::remap_the_kernel(</span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator, </span><span> boot_info); </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>self::paging::Page; </span><span> </span><span style="color:#569cd6;">use </span><span>{</span><span style="color:#b4cea8;">HEAP_START</span><span>, </span><span style="color:#b4cea8;">HEAP_SIZE</span><span>}; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> heap_start_page = Page::containing_address(</span><span style="color:#b4cea8;">HEAP_START</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> heap_end_page = Page::containing_address(</span><span style="color:#b4cea8;">HEAP_START </span><span>+ </span><span style="color:#b4cea8;">HEAP_SIZE</span><span>-</span><span style="color:#b5cea8;">1</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">for</span><span> page </span><span style="color:#569cd6;">in </span><span>Page::range_inclusive(heap_start_page, heap_end_page) { </span><span> active_table.map(page, paging::</span><span style="color:#b4cea8;">WRITABLE</span><span>, </span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator); </span><span> } </span><span>} </span></code></pre> <p>The <code>Page::range_inclusive</code> function is just a copy of the <code>Frame::range_inclusive</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span> </span><span>#[derive(…, PartialEq, Eq, PartialOrd, Ord)] </span><span style="color:#569cd6;">pub struct </span><span>Page {...} </span><span> </span><span style="color:#569cd6;">impl </span><span>Page { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">pub fn </span><span>range_inclusive(start: Page, end: Page) -&gt; PageIter { </span><span> PageIter { </span><span> start: start, </span><span> end: end, </span><span> } </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">pub struct </span><span>PageIter { </span><span> start: Page, </span><span> end: Page, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Iterator </span><span style="color:#569cd6;">for </span><span>PageIter { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Item </span><span>= Page; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>next(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;Page&gt; { </span><span> </span><span style="color:#569cd6;">if </span><span>self.start &lt;= self.end { </span><span> </span><span style="color:#569cd6;">let</span><span> page = self.start; </span><span> self.start.number += </span><span style="color:#b5cea8;">1</span><span>; </span><span> Some(page) </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> None </span><span> } </span><span> } </span><span>} </span></code></pre> <p>Now we map the whole heap to physical pages. This needs some time and might introduce a noticeable delay when we increase the heap size in the future. Another drawback is that we consume a large amount of physical frames even though we might not need the whole heap space. We will fix these problems in a future post by mapping the pages lazily.</p> <h3 id="it-works"><a class="zola-anchor" href="#it-works" aria-label="Anchor link for: it-works">🔗</a>It works!</h3> <p>Now <code>Box</code> and <code>Vec</code> should work. For example:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in rust_main in src/lib.rs </span><span> </span><span style="color:#569cd6;">use </span><span>alloc::boxed::Box; </span><span style="color:#569cd6;">let mut</span><span> heap_test = Box::new(</span><span style="color:#b5cea8;">42</span><span>); </span><span>*heap_test -= </span><span style="color:#b5cea8;">15</span><span>; </span><span style="color:#569cd6;">let</span><span> heap_test2 = Box::new(</span><span style="color:#d69d85;">&quot;hello&quot;</span><span>); </span><span>println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?} {:?}</span><span style="color:#d69d85;">&quot;</span><span>, heap_test, heap_test2); </span><span> </span><span style="color:#569cd6;">let mut</span><span> vec_test = vec![</span><span style="color:#b5cea8;">1</span><span>,</span><span style="color:#b5cea8;">2</span><span>,</span><span style="color:#b5cea8;">3</span><span>,</span><span style="color:#b5cea8;">4</span><span>,</span><span style="color:#b5cea8;">5</span><span>,</span><span style="color:#b5cea8;">6</span><span>,</span><span style="color:#b5cea8;">7</span><span>]; </span><span>vec_test[</span><span style="color:#b5cea8;">3</span><span>] = </span><span style="color:#b5cea8;">42</span><span>; </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in &amp;</span><span>vec_test { </span><span> print!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{} </span><span style="color:#d69d85;">&quot;</span><span>, i); </span><span>} </span></code></pre> <p>We can also use all other types of the <code>alloc</code> crate, including:</p> <ul> <li>the reference counted pointers <a href="https://doc.rust-lang.org/1.10.0/alloc/rc/">Rc</a> and <a href="https://doc.rust-lang.org/1.10.0/alloc/arc/">Arc</a></li> <li>the owned string type <a href="https://doc.rust-lang.org/1.10.0/collections/string/struct.String.html">String</a> and the <a href="https://doc.rust-lang.org/1.10.0/collections/macro.format!.html">format!</a> macro</li> <li><a href="https://doc.rust-lang.org/1.10.0/collections/linked_list/struct.LinkedList.html">Linked List</a></li> <li>the growable ring buffer <a href="https://doc.rust-lang.org/1.10.0/collections/vec_deque/struct.VecDeque.html">VecDeque</a></li> <li><a href="https://doc.rust-lang.org/1.10.0/collections/binary_heap/struct.BinaryHeap.html">BinaryHeap</a></li> <li><a href="https://doc.rust-lang.org/1.10.0/collections/btree_map/struct.BTreeMap.html">BTreeMap</a> and <a href="https://doc.rust-lang.org/1.10.0/collections/btree_set/struct.BTreeSet.html">BTreeSet</a></li> </ul> <h2 id="a-better-allocator"><a class="zola-anchor" href="#a-better-allocator" aria-label="Anchor link for: a-better-allocator">🔗</a>A better Allocator</h2> <p>Right now, we leak every freed memory block. Thus, we run out of memory quickly, for example, by creating a new <code>String</code> in each iteration of a loop:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in rust_main in src/lib.rs </span><span> </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">10000 </span><span>{ </span><span> format!(</span><span style="color:#d69d85;">&quot;Some String&quot;</span><span>); </span><span>} </span></code></pre> <p>To fix this, we need to create an allocator that keeps track of freed memory blocks and reuses them if possible. This introduces some challenges:</p> <ul> <li>We need to keep track of a possibly unlimited number of freed blocks. For example, an application could allocate <code>n</code> one-byte sized blocks and free every second block, which creates <code>n/2</code> freed blocks. We can’t rely on any upper bound of freed block since <code>n</code> could be arbitrarily large.</li> <li>We can’t use any of the collections from above, since they rely on allocations themselves. (It might be possible as soon as <a href="https://github.com/rust-lang/rfcs/blob/master/text/1398-kinds-of-allocators.md">RFC #1398</a> is <a href="https://github.com/rust-lang/rust/issues/32838">implemented</a>, which allows user-defined allocators for specific collection instances.)</li> <li>We need to merge adjacent freed blocks if possible. Otherwise, the freed memory is no longer usable for large allocations. We will discuss this point in more detail below.</li> <li>Our allocator should search the set of freed blocks quickly and keep fragmentation low.</li> </ul> <h3 id="creating-a-list-of-freed-blocks"><a class="zola-anchor" href="#creating-a-list-of-freed-blocks" aria-label="Anchor link for: creating-a-list-of-freed-blocks">🔗</a>Creating a List of freed Blocks</h3> <p>Where do we store the information about an unlimited number of freed blocks? We can’t use any fixed size data structure since it could always be too small for some allocation sequences. So we need some kind of dynamically growing set.</p> <p>One possible solution could be to use an array-like data structure that starts at some unused virtual address. If the array becomes full, we increase its size and map new physical frames as backing storage. This approach would require a large part of the virtual address space since the array could grow significantly. We would need to create a custom implementation of a growable array and manipulate the page tables when deallocating. It would also consume a possibly large number of physical frames as backing storage.</p> <p>We will choose another solution with different tradoffs. It’s not clearly “better” than the approach above and has significant disadvantages itself. However, it has one big advantage: It does not need any additional physical or virtual memory at all. This makes it less complex since we don’t need to manipulate any page tables. The idea is the following:</p> <p>A freed memory block is not used anymore and no one needs the stored information. It is still mapped to a virtual address and backed by a physical page. So we just store the information about the freed block <em>in the block itself</em>. We keep a pointer to the first block and store a pointer to the next block in each block. Thus, we create a single linked list:</p> <p><img src="https://os.phil-opp.com/kernel-heap/overview.svg" alt="Linked List Allocator" /></p> <p>In the following, we call a freed block a <em>hole</em>. Each hole stores its size and a pointer to the next hole. If a hole is larger than needed, we leave the remaining memory unused. By storing a pointer to the first hole, we are able to traverse the complete list.</p> <h4 id="initialization"><a class="zola-anchor" href="#initialization" aria-label="Anchor link for: initialization">🔗</a>Initialization</h4> <p>When the heap is created, all of its memory is unused. Thus, it forms a single large hole:</p> <p><img src="https://os.phil-opp.com/kernel-heap/initialization.svg" alt="Heap Initialization" /></p> <p>The optional pointer to the next hole is set to <code>None</code>.</p> <h4 id="allocation"><a class="zola-anchor" href="#allocation" aria-label="Anchor link for: allocation">🔗</a>Allocation</h4> <p>In order to allocate a block of memory, we need to find a hole that satisfies the size and alignment requirements. If the found hole is larger than required, we split it into two smaller holes. For example, when we allocate a 24 byte block right after initialization, we split the single hole into a hole of size 24 and a hole with the remaining size:</p> <p><img src="https://os.phil-opp.com/kernel-heap/split-hole.svg" alt="split hole" /></p> <p>Then we use the new 24 byte hole to perform the allocation:</p> <p><img src="https://os.phil-opp.com/kernel-heap/allocate.svg" alt="24 bytes allocated" /></p> <p>To find a suitable hole, we can use several search strategies:</p> <ul> <li><strong>best fit</strong>: Search the whole list and choose the <em>smallest</em> hole that satisfies the requirements.</li> <li><strong>worst fit</strong>: Search the whole list and choose the <em>largest</em> hole that satisfies the requirements.</li> <li><strong>first fit</strong>: Search the list from the beginning and choose the <em>first</em> hole that satisfies the requirements.</li> </ul> <p>Each strategy has its advantages and disadvantages. Best fit uses the smallest hole possible and leaves larger holes for large allocations. But splitting the smallest hole might create a tiny hole, which is too small for most allocations. In contrast, the worst fit strategy always chooses the largest hole. Thus, it does not create tiny holes, but it consumes the large block, which might be required for large allocations.</p> <p>For our use case, the best fit strategy is better than worst fit. The reason is that we have a minimal hole size of 16 bytes, since each hole needs to be able to store a size (8 bytes) and a pointer to the next hole (8 bytes). Thus, even the best fit strategy leads to holes of usable size. Furthermore, we will need to allocate very large blocks occasionally (e.g. for <a href="https://en.wikipedia.org/wiki/Direct_memory_access">DMA</a> buffers).</p> <p>However, both best fit and worst fit have a significant problem: They need to scan the whole list for each allocation in order to find the optimal block. This leads to long allocation times if the list is long. The first fit strategy does not have this problem, as it returns as soon as it finds a suitable hole. It is fairly fast for small allocations and might only need to scan the whole list for large allocations.</p> <h4 id="deallocation"><a class="zola-anchor" href="#deallocation" aria-label="Anchor link for: deallocation">🔗</a>Deallocation</h4> <p>To deallocate a block of memory, we can just insert its corresponding hole somewhere into the list. However, we need to merge adjacent holes. Otherwise, we are unable to reuse the freed memory for larger allocations. For example:</p> <p><img src="https://os.phil-opp.com/kernel-heap/deallocate.svg" alt="deallocate memory, which leads to adjacent holes" /></p> <p>In order to use these adjacent holes for a large allocation, we need to merge them to a single large hole first:</p> <p><img src="https://os.phil-opp.com/kernel-heap/merge-holes-and-allocate.svg" alt="merge adjacent holes and allocate large block" /></p> <p>The easiest way to ensure that adjacent holes are always merged, is to keep the hole list sorted by address. Thus, we only need to check the predecessor and the successor in the list when we free a memory block. If they are adjacent to the freed block, we merge the corresponding holes. Else, we insert the freed block as a new hole at the correct position.</p> <h3 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h3> <p>The detailed implementation would go beyond the scope of this post, since it contains several hidden difficulties. For example:</p> <ul> <li>Several merge cases: Merge with the previous hole, merge with the next hole, merge with both holes.</li> <li>We need to satisfy the alignment requirements, which requires additional splitting logic.</li> <li>The minimal hole size of 16 bytes: We must not create smaller holes when splitting a hole.</li> </ul> <p>I created the <a href="https://docs.rs/crate/linked_list_allocator/0.4.1">linked_list_allocator</a> crate to handle all of these cases. It consists of a <a href="https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.Heap.html">Heap struct</a> that provides an <code>allocate_first_fit</code> and a <code>deallocate</code> method. It also contains a <a href="https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.LockedHeap.html">LockedHeap</a> type that wraps <code>Heap</code> into spinlock so that it’s usable as a static system allocator. If you are interested in the implementation details, check out the <a href="https://github.com/phil-opp/linked-list-allocator">source code</a>.</p> <p>We need to add the extern crate to our <code>Cargo.toml</code> and our <code>lib.rs</code>:</p> <pre data-lang="bash" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-bash "><code class="language-bash" data-lang="bash"><span>&gt; cargo add linked_list_allocator </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">extern crate</span><span> linked_list_allocator; </span></code></pre> <p>Now we can change our global allocator:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>linked_list_allocator::LockedHeap; </span><span> </span><span>#[global_allocator] </span><span style="color:#569cd6;">static </span><span style="color:#b4cea8;">HEAP_ALLOCATOR</span><span>: LockedHeap = LockedHeap::empty(); </span></code></pre> <p>We can’t initialize the linked list allocator statically, since it needs to initialize the first hole (like described <a href="https://os.phil-opp.com/kernel-heap/#initialization">above</a>). This can’t be done at compile time, so the function can’t be a <code>const</code> function. Therefore we can only create an empty heap and initialize it later at runtime. For that, we add the following lines to our <code>rust_main</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> […] </span><span> </span><span> </span><span style="color:#608b4e;">// set up guard page and map the heap pages </span><span> memory::init(boot_info); </span><span> </span><span> </span><span style="color:#608b4e;">// initialize the heap allocator </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#b4cea8;">HEAP_ALLOCATOR</span><span>.lock().init(</span><span style="color:#b4cea8;">HEAP_START</span><span>, </span><span style="color:#b4cea8;">HEAP_START </span><span>+ </span><span style="color:#b4cea8;">HEAP_SIZE</span><span>); </span><span> } </span><span> […] </span><span>} </span></code></pre> <p>It is important that we initialize the heap <em>after</em> mapping the heap pages, since the init function writes to the heap memory (the first hole).</p> <p>Our kernel uses the new allocator now, so we can deallocate memory without leaking it. The example from above should work now without causing an OOM situation:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in rust_main in src/lib.rs </span><span> </span><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">10000 </span><span>{ </span><span> format!(</span><span style="color:#d69d85;">&quot;Some String&quot;</span><span>); </span><span>} </span></code></pre> <h3 id="performance"><a class="zola-anchor" href="#performance" aria-label="Anchor link for: performance">🔗</a>Performance</h3> <p>The linked list based approach has some performance problems. Each allocation or deallocation might need to scan the complete list of holes in the worst case. However, I think it’s good enough for now, since our heap will stay relatively small for the near future. When our allocator becomes a performance problem eventually, we can just replace it with a faster alternative.</p> <h2 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">🔗</a>Summary</h2> <p>Now we’re able to use heap storage in our kernel without leaking memory. This allows us to effectively process dynamic data such as user supplied strings in the future. We can also use <code>Rc</code> and <code>Arc</code> to create types with shared ownership. And we have access to various data structures such as <code>Vec</code> or <code>Linked List</code>, which will make our lives much easier. We even have some well tested and optimized <a href="https://en.wikipedia.org/wiki/Binary_heap">binary heap</a> and <a href="https://en.wikipedia.org/wiki/B-tree">B-tree</a> implementations!</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>This post concludes the section about memory management for now. We will revisit this topic eventually, but now it’s time to explore other topics. The upcoming posts will be about CPU exceptions and interrupts. We will catch all page, double, and triple faults and create a driver to read keyboard input. The <a href="https://os.phil-opp.com/handling-exceptions/">next post</a> starts by setting up a so-called <em>Interrupt Descriptor Table</em>.</p> Remap the Kernel Fri, 01 Jan 2016 00:00:00 +0000 https://os.phil-opp.com/remap-the-kernel/ https://os.phil-opp.com/remap-the-kernel/ <p>In this post we will create a new page table to map the kernel sections correctly. Therefore we will extend the paging module to support modifications of <em>inactive</em> page tables as well. Then we will switch to the new table and secure our kernel stack by creating a guard page.</p> <span id="continue-reading"></span> <p>As always, you can find the source code on <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_7">GitHub</a>. Don’t hesitate to file issues there if you have any problems or improvement suggestions. There is also a comment section at the end of this page. Note that this post requires a current Rust nightly.</p> <h2 id="motivation"><a class="zola-anchor" href="#motivation" aria-label="Anchor link for: motivation">🔗</a>Motivation</h2> <p>In the <a href="https://os.phil-opp.com/page-tables/">previous post</a>, we had a strange bug in the <code>unmap</code> function. Its reason was a silent stack overflow, which corrupted the page tables. Fortunately, our kernel stack is right above the page tables so that we noticed the overflow relatively quickly. This won’t be the case when we add threads with new stacks in the future. Then a silent stack overflow could overwrite some data without us noticing. But eventually some completely unrelated function fails because a variable changed its value.</p> <p>As you can imagine, these kinds of bugs are horrendous to debug. For that reason we will create a new hierarchical page table in this post, which has <em>guard page</em> below the stack. A guard page is basically an unmapped page that causes a page fault when accessed. Thus we can catch stack overflows right when they happen.</p> <p>Also, we will use the <a href="https://os.phil-opp.com/allocating-frames/#kernel-elf-sections">information about kernel sections</a> to map the various sections individually instead of blindly mapping the first gigabyte. To improve safety even further, we will set the correct page table flags for the various sections. Thus it won’t be possible to modify the contents of <code>.text</code> or to execute code from <code>.data</code> anymore.</p> <h2 id="preparation"><a class="zola-anchor" href="#preparation" aria-label="Anchor link for: preparation">🔗</a>Preparation</h2> <p>There are many things that can go wrong when we switch to a new table. Therefore it’s a good idea to <a href="https://os.phil-opp.com/set-up-gdb/">set up a debugger</a>. You should not need it when you follow this post, but it’s good to know how to debug a problem when it occurs<sup class="footnote-reference"><a href="#fn-debug-notes">1</a></sup>.</p> <p>We also update the <code>Page</code> and <code>Frame</code> types to make our lives easier. The <code>Page</code> struct gets some derived traits:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span> </span><span>#[derive(Debug, Clone, Copy)] </span><span style="color:#569cd6;">pub struct </span><span>Page { </span><span> number: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span></code></pre> <p>By making it <a href="https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html">Copy</a>, we can still use it after passing it to functions such as <code>map_to</code>. We also make the <code>Page::containing_address</code> public (if it isn’t already).</p> <p>The <code>Frame</code> type gets a <code>clone</code> method too, but it does not implement the <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html">Clone trait</a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Frame { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">fn </span><span>clone(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; Frame { </span><span> Frame { number: self.number } </span><span> } </span><span>} </span></code></pre> <p>The big difference is that this <code>clone</code> method is private. If we implemented the <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html">Clone trait</a>, it would be public and usable from other modules. For example they could abuse it to free the same frame twice in the frame allocator.</p> <p>So why do we implement <code>Copy</code> for <code>Page</code> and make even its constructor public, but keep <code>Frame</code> as private as possible? The reason is that we can easily check the status of a <code>Page</code> by looking at the page tables. For example, the <code>map_to</code> function can easily check that the given page is unused.</p> <p>We can’t do that for a <code>Frame</code>. If we wanted to be sure that a given frame is unused, we would need to look at all mapped <em>pages</em> and verify that none of them is mapped to the given frame. Since this is impractical, we need to rely on the fact that a passed <code>Frame</code> is always unused. For that reason it must not be possible to create a new <code>Frame</code> or to clone one from other modules. The only valid way to get a frame is to allocate it from a <code>FrameAllocator</code>.</p> <h2 id="recap-the-paging-module"><a class="zola-anchor" href="#recap-the-paging-module" aria-label="Anchor link for: recap-the-paging-module">🔗</a>Recap: The Paging Module</h2> <p>This post builds upon the post about <a href="https://os.phil-opp.com/page-tables/">page tables</a>, so let’s start by quickly recapitulating what we’ve done there.</p> <p>We created a <code>memory::paging</code> module, which reads and modifies the hierarchical page table through recursive mapping. The owner of the active P4 table and thus all subtables is an <code>ActivePageTable</code> struct, which must be instantiated only once.</p> <p>The <code>ActivePageTable</code> struct provides the following interface:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">/// Translates a virtual to the corresponding physical address. </span><span style="color:#608b4e;">/// Returns `None` if the address is not mapped. </span><span style="color:#569cd6;">pub fn </span><span>translate(</span><span style="color:#569cd6;">&amp;</span><span>self, virtual_address: VirtualAddress) -&gt; </span><span> Option&lt;PhysicalAddress&gt; </span><span>{</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span style="color:#608b4e;">/// Maps the page to the frame with the provided flags. </span><span style="color:#608b4e;">/// The `PRESENT` flag is added by default. Needs a </span><span style="color:#608b4e;">/// `FrameAllocator` as it might need to create new page tables. </span><span style="color:#569cd6;">pub fn </span><span>map_to&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> page: Page, </span><span> frame: Frame, </span><span> flags: EntryFlags, </span><span> allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span style="color:#608b4e;">/// Maps the page to some free frame with the provided flags. </span><span style="color:#608b4e;">/// The free frame is allocated from the given `FrameAllocator`. </span><span style="color:#569cd6;">pub fn </span><span>map&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, page: Page, flags: EntryFlags, allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span style="color:#608b4e;">/// Identity map the the given frame with the provided flags. </span><span style="color:#608b4e;">/// The `FrameAllocator` is used to create new page tables if needed. </span><span style="color:#569cd6;">pub fn </span><span>identity_map&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> frame: Frame, </span><span> flags: EntryFlags, </span><span> allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#608b4e;">/// Unmaps the given page and adds all freed frames to the given </span><span style="color:#608b4e;">/// `FrameAllocator`. </span><span style="color:#569cd6;">fn </span><span>unmap&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, page: Page, allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{</span><span style="color:#569cd6;">...</span><span>} </span></code></pre> <h2 id="overview"><a class="zola-anchor" href="#overview" aria-label="Anchor link for: overview">🔗</a>Overview</h2> <p>Our goal is to use the <code>ActivePageTable</code> functions to map the kernel sections correctly in a new page table. In pseudo code:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>remap_the_kernel(boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) { </span><span> </span><span style="color:#569cd6;">let</span><span> new_table = create_new_table(); </span><span> </span><span> </span><span style="color:#569cd6;">for</span><span> section </span><span style="color:#569cd6;">in</span><span> boot_info.elf_sections { </span><span> </span><span style="color:#569cd6;">for</span><span> frame </span><span style="color:#569cd6;">in</span><span> section { </span><span> new_table.identity_map(frame, section.flags); </span><span> } </span><span> } </span><span> </span><span> new_table.activate(); </span><span> create_guard_page_for_stack(); </span><span>} </span></code></pre> <p>But the <code>ActivePageTable</code> methods – as the name suggests – only work for the <em>active table</em>. So we would need to activate <code>new_table</code> <em>before</em> we use <code>identity_map</code>. But this is not possible since it would cause an immediate page fault when the CPU tries to read the next instruction.</p> <p>So we need a way to use the <code>ActivePageTable</code> methods on <em>inactive</em> page tables as well.</p> <h2 id="inactive-tables"><a class="zola-anchor" href="#inactive-tables" aria-label="Anchor link for: inactive-tables">🔗</a>Inactive Tables</h2> <p>Let’s start by creating a type for inactive page tables. Like an <code>ActivePageTable</code>, an <code>InactivePageTable</code> owns a P4 table. The difference is that the inactive P4 table is not used by the CPU.</p> <p>We create the struct in <code>memory/paging/mod.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub struct </span><span>InactivePageTable { </span><span> p4_frame: Frame, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>InactivePageTable { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new(frame: Frame) -&gt; InactivePageTable { </span><span> </span><span style="color:#608b4e;">// TODO zero and recursive map the frame </span><span> InactivePageTable { p4_frame: frame } </span><span> } </span><span>} </span></code></pre> <p>Without zeroing, the P4 table contains complete garbage and maps random memory. But we can’t zero it right now because the <code>p4_frame</code> is not mapped to a virtual address.</p> <p>Well, maybe it’s still part of the identity mapped first gigabyte. Then we could zero it without problems since the physical address would be a valid virtual address, too. But this “solution” is hacky and won’t work after this post anymore (since we will remove all needless identity mapping).</p> <p>Instead, we will try to temporary map the frame to some virtual address.</p> <h3 id="temporary-mapping"><a class="zola-anchor" href="#temporary-mapping" aria-label="Anchor link for: temporary-mapping">🔗</a>Temporary Mapping</h3> <p>Therefor we add a <code>TemporaryPage</code> struct. We create it in a new <code>temporary_page</code> submodule to keep the paging module clean. It looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// src/memory/paging/mod.rs </span><span style="color:#569cd6;">mod </span><span>temporary_page; </span><span> </span><span style="color:#608b4e;">// src/memory/paging/temporary_page.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::Page; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>TemporaryPage { </span><span> page: Page, </span><span>} </span></code></pre> <p>We add methods to temporary map and unmap the page:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use super</span><span>::{ActivePageTable, VirtualAddress}; </span><span style="color:#569cd6;">use </span><span>memory::Frame; </span><span> </span><span style="color:#569cd6;">impl </span><span>TemporaryPage { </span><span> </span><span style="color:#608b4e;">/// Maps the temporary page to the given frame in the active table. </span><span> </span><span style="color:#608b4e;">/// Returns the start address of the temporary page. </span><span> </span><span style="color:#569cd6;">pub fn </span><span>map(</span><span style="color:#569cd6;">&amp;mut </span><span>self, frame: Frame, active_table: </span><span style="color:#569cd6;">&amp;mut</span><span> ActivePageTable) </span><span> -&gt; VirtualAddress </span><span> { </span><span> </span><span style="color:#569cd6;">use super</span><span>::entry::</span><span style="color:#b4cea8;">WRITABLE</span><span>; </span><span> </span><span> assert!(active_table.translate_page(self.page).is_none(), </span><span> </span><span style="color:#d69d85;">&quot;temporary page is already mapped&quot;</span><span>); </span><span> active_table.map_to(self.page, frame, </span><span style="color:#b4cea8;">WRITABLE</span><span>, </span><span style="color:#569cd6;">???</span><span>); </span><span> self.page.start_address() </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">/// Unmaps the temporary page in the active table. </span><span> </span><span style="color:#569cd6;">pub fn </span><span>unmap(</span><span style="color:#569cd6;">&amp;mut </span><span>self, active_table: </span><span style="color:#569cd6;">&amp;mut</span><span> ActivePageTable) { </span><span> active_table.unmap(self.page, </span><span style="color:#569cd6;">???</span><span>) </span><span> } </span><span>} </span></code></pre> <p>The <code>???</code> needs to be some <code>FrameAllocator</code>. We could just add an additional <code>allocator</code> argument but there is a better solution.</p> <p>It takes advantage of the fact that we always map the same page. So the allocator only needs to hold 3 frames: one P3, one P2, and one P1 table (the P4 table is always mapped). This allows us to create a tiny allocator and add it as field to the <code>TemporaryPage</code> struct itself:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub struct </span><span>TemporaryPage { </span><span> page: Page, </span><span> allocator: TinyAllocator, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>TemporaryPage { </span><span> </span><span style="color:#608b4e;">// as above, but with `&amp;mut self.allocator` instead of `???` </span><span>} </span><span> </span><span style="color:#569cd6;">struct </span><span>TinyAllocator([Option&lt;Frame&gt;; 3]); </span></code></pre> <p>Our tiny allocator just consists of 3 slots to store frames. It will be empty when the temporary page is mapped and full when all corresponding page tables are unmapped.</p> <p>To turn <code>TinyAllocator</code> into a frame allocator, we need to add the trait implementation:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>memory::FrameAllocator; </span><span> </span><span style="color:#569cd6;">impl </span><span>FrameAllocator </span><span style="color:#569cd6;">for </span><span>TinyAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;Frame&gt; { </span><span> </span><span style="color:#569cd6;">for</span><span> frame_option </span><span style="color:#569cd6;">in &amp;mut </span><span>self.</span><span style="color:#b5cea8;">0 </span><span>{ </span><span> </span><span style="color:#569cd6;">if</span><span> frame_option.is_some() { </span><span> </span><span style="color:#569cd6;">return</span><span> frame_option.take(); </span><span> } </span><span> } </span><span> None </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>deallocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self, frame: Frame) { </span><span> </span><span style="color:#569cd6;">for</span><span> frame_option </span><span style="color:#569cd6;">in &amp;mut </span><span>self.</span><span style="color:#b5cea8;">0 </span><span>{ </span><span> </span><span style="color:#569cd6;">if</span><span> frame_option.is_none() { </span><span> *frame_option = Some(frame); </span><span> </span><span style="color:#569cd6;">return</span><span>; </span><span> } </span><span> } </span><span> panic!(</span><span style="color:#d69d85;">&quot;Tiny allocator can hold only 3 frames.&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <p>On allocation, we use the <a href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html#method.take">Option::take</a> function to take an available frame from the first filled slot and on deallocation, we put the frame back into the first free slot.</p> <p>To finish the <code>TinyAllocator</code>, we add a constructor that fills it from some other allocator:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>TinyAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>new&lt;A&gt;(allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) -&gt; TinyAllocator </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span> { </span><span> </span><span style="color:#569cd6;">let mut </span><span>f </span><span style="color:#569cd6;">= </span><span>|| allocator.allocate_frame(); </span><span> </span><span style="color:#569cd6;">let</span><span> frames = [f(), f(), f()]; </span><span> TinyAllocator(frames) </span><span> } </span><span>} </span></code></pre> <p>We use a little closure here that saves us some typing.</p> <p>Now our <code>TemporaryPage</code> type is nearly complete. We only add one more method for convenience:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use super</span><span>::table::{Table, Level1}; </span><span> </span><span style="color:#608b4e;">/// Maps the temporary page to the given page table frame in the active </span><span style="color:#608b4e;">/// table. Returns a reference to the now mapped table. </span><span style="color:#569cd6;">pub fn </span><span>map_table_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> frame: Frame, </span><span> active_table: </span><span style="color:#569cd6;">&amp;mut</span><span> ActivePageTable) </span><span> -&gt; </span><span style="color:#569cd6;">&amp;mut </span><span>Table&lt;Level1&gt; { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*(self.map(frame, active_table) </span><span style="color:#569cd6;">as *mut </span><span>Table&lt;Level1&gt;) } </span><span>} </span></code></pre> <p>This function interprets the given frame as a page table frame and returns a <code>Table</code> reference. We return a table of level 1 because it <a href="https://os.phil-opp.com/page-tables/#some-clever-solution">forbids calling the <code>next_table</code> methods</a>. Calling <code>next_table</code> must not be possible since it’s not a page of the recursive mapping. To be able to return a <code>Table&lt;Level1&gt;</code>, we need to make the <code>Level1</code> enum in <code>memory/paging/table.rs</code> public.</p> <p>The <code>unsafe</code> block is safe since the <code>VirtualAddress</code> returned by the <code>map</code> function is always valid and the type cast just reinterprets the frame’s content.</p> <p>To complete the <code>temporary_page</code> module, we add a <code>TemporaryPage::new</code> constructor:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>new&lt;A&gt;(page: Page, allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) -&gt; TemporaryPage </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> TemporaryPage { </span><span> page: page, </span><span> allocator: TinyAllocator::new(allocator), </span><span> } </span><span>} </span></code></pre> <h3 id="zeroing-the-inactivepagetable"><a class="zola-anchor" href="#zeroing-the-inactivepagetable" aria-label="Anchor link for: zeroing-the-inactivepagetable">🔗</a>Zeroing the InactivePageTable</h3> <p>Now we can use <code>TemporaryPage</code> to fix our <code>InactivePageTable::new</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span> </span><span style="color:#569cd6;">use </span><span>self::temporary_page::TemporaryPage; </span><span> </span><span style="color:#569cd6;">impl </span><span>InactivePageTable { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>new(frame: Frame, </span><span> active_table: </span><span style="color:#569cd6;">&amp;mut</span><span> ActivePageTable, </span><span> temporary_page: </span><span style="color:#569cd6;">&amp;mut</span><span> TemporaryPage) </span><span> -&gt; InactivePageTable { </span><span> { </span><span> </span><span style="color:#569cd6;">let</span><span> table = temporary_page.map_table_frame(frame.clone(), </span><span> active_table); </span><span> </span><span style="color:#608b4e;">// now we are able to zero the table </span><span> table.zero(); </span><span> </span><span style="color:#608b4e;">// set up recursive mapping for the table </span><span> table[</span><span style="color:#b5cea8;">511</span><span>].set(frame.clone(), </span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">WRITABLE</span><span>); </span><span> } </span><span> temporary_page.unmap(active_table); </span><span> </span><span> InactivePageTable { p4_frame: frame } </span><span> } </span><span>} </span></code></pre> <p>We added two new arguments, <code>active_table</code> and <code>temporary_page</code>. We need an <a href="https://doc.rust-lang.org/rust-by-example/variable_bindings/scope.html">inner scope</a> to ensure that the <code>table</code> variable is dropped before we try to unmap the temporary page again. This is required since the <code>table</code> variable exclusively borrows <code>temporary_page</code> as long as it’s alive.</p> <p>Now we are able to create valid inactive page tables, which are zeroed and recursively mapped. But we still can’t modify them. To resolve this problem, we need to look at recursive mapping again.</p> <h2 id="revisiting-recursive-mapping"><a class="zola-anchor" href="#revisiting-recursive-mapping" aria-label="Anchor link for: revisiting-recursive-mapping">🔗</a>Revisiting Recursive Mapping</h2> <p>Recursive mapping works by mapping the last P4 entry to the P4 table itself. Thus we can access the page tables by looping one or more times.</p> <p>For example, accessing a P3 table requires lopping three times:</p> <p><img src="https://os.phil-opp.com/remap-the-kernel/recursive_mapping_access_p3.svg" alt="access active P3 table through recursive mapping" /></p> <p>We can use the same mechanism to access inactive tables. The trick is to change the recursive mapping of the active P4 table to point to the inactive P4 table:</p> <p><img src="https://os.phil-opp.com/remap-the-kernel/recursive_mapping_access_p3_inactive_table.svg" alt="access inactive P3 table through recursive mapping" /></p> <p>Now the inactive table can be accessed exactly as the active table, even the magic addresses are the same. This allows us to use the <code>ActivePageTable</code> interface and the existing mapping methods for inactive tables, too. Note that everything besides the recursive mapping continues to work exactly as before since we’ve never changed the active table in the CPU.</p> <h3 id="implementation-draft"><a class="zola-anchor" href="#implementation-draft" aria-label="Anchor link for: implementation-draft">🔗</a>Implementation Draft</h3> <p>We add a method to <code>ActivePageTable</code> that temporary changes the recursive mapping and executes a given closure in the new context:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>with&lt;F&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> table: </span><span style="color:#569cd6;">&amp;mut</span><span> InactivePageTable, </span><span> f: F) </span><span> </span><span style="color:#569cd6;">where</span><span> F: FnOnce(</span><span style="color:#569cd6;">&amp;mut</span><span> ActivePageTable) </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::tlb; </span><span> </span><span> </span><span style="color:#608b4e;">// overwrite recursive mapping </span><span> self.p4_mut()[</span><span style="color:#b5cea8;">511</span><span>].set(table.p4_frame.clone(), </span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">WRITABLE</span><span>); </span><span> tlb::flush_all(); </span><span> </span><span> </span><span style="color:#608b4e;">// execute f in the new context </span><span> f(self); </span><span> </span><span> </span><span style="color:#608b4e;">// TODO restore recursive mapping to original p4 table </span><span>} </span></code></pre> <p>It overwrites the 511th P4 entry and points it to the inactive table frame. Then it flushes the <a href="https://wiki.osdev.org/TLB">translation lookaside buffer (TLB)</a>, which still contains some old translations. We need to flush all pages that are part of the recursive mapping, so the easiest way is to flush the TLB completely.</p> <p>Now that the recursive mapping points to the given inactive table, we execute the closure in the new context. The closure can call all active table methods such as <code>translate</code> or <code>map_to</code>. It could even call <code>with</code> again and chain another inactive table! Wait… that would not work:</p> <p><img src="https://os.phil-opp.com/remap-the-kernel/recursive_mapping_access_p1_invalid_chaining.svg" alt="access inactive P3 table through recursive mapping" /></p> <p>Here the closure called <code>with</code> again and thus changed the recursive mapping of the inactive table to point to a second inactive table. Now we want to modify the P1 of the <em>second</em> inactive table, but instead we land on the P1 of the <em>first</em> inactive table since we never follow the pointer to the second table. Only when modifying the P2, P3, or P4 table we really access the second inactive table. This inconsistency would break our mapping functions completely.</p> <p>So we should really prohibit the closure from calling <code>with</code> again. We could add some runtime assertion that panics when the active table is not recursive mapped anymore. But a cleaner solution is to split off the mapping code from <code>ActivePageTable</code> into a new <code>Mapper</code> type.</p> <h3 id="refactoring"><a class="zola-anchor" href="#refactoring" aria-label="Anchor link for: refactoring">🔗</a>Refactoring</h3> <p>We start by creating a new <code>memory/paging/mapper.rs</code> submodule and moving the <code>ActivePageTable</code> struct and its <code>impl</code> block to it. Then we rename it to <code>Mapper</code> and make all methods public (so we can still use them from the paging module). The <code>with</code> method is removed.</p> <p>After adjusting the imports, the module should look like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in memory/paging/mod.rs </span><span style="color:#569cd6;">mod </span><span>mapper; </span><span> </span><span style="color:#608b4e;">// memory/paging/mapper.rs </span><span> </span><span style="color:#569cd6;">use super</span><span>::{VirtualAddress, PhysicalAddress, Page, </span><span style="color:#b4cea8;">ENTRY_COUNT</span><span>}; </span><span style="color:#569cd6;">use super</span><span>::entry::*; </span><span style="color:#569cd6;">use super</span><span>::table::{self, Table, Level4, Level1}; </span><span style="color:#569cd6;">use </span><span>memory::{</span><span style="color:#b4cea8;">PAGE_SIZE</span><span>, Frame, FrameAllocator}; </span><span style="color:#569cd6;">use </span><span>core::ptr::Unique; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Mapper { </span><span> p4: Unique&lt;Table&lt;Level4&gt;&gt;, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Mapper { </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>new() -&gt; Mapper {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>p4(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;</span><span>Table&lt;Level4&gt; {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#608b4e;">// the remaining mapping methods, all public </span><span>} </span></code></pre> <p>Now we create a new <code>ActivePageTable</code> struct in <code>memory/paging/mod.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub use </span><span>self::mapper::Mapper; </span><span style="color:#569cd6;">use </span><span>core::ops::{Deref, DerefMut}; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>ActivePageTable { </span><span> mapper: Mapper, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Deref </span><span style="color:#569cd6;">for </span><span>ActivePageTable { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Target </span><span>= Mapper; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>deref(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;</span><span>Mapper { </span><span> </span><span style="color:#569cd6;">&amp;</span><span>self.mapper </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>DerefMut </span><span style="color:#569cd6;">for </span><span>ActivePageTable { </span><span> </span><span style="color:#569cd6;">fn </span><span>deref_mut(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;mut</span><span> Mapper { </span><span> </span><span style="color:#569cd6;">&amp;mut </span><span>self.mapper </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>ActivePageTable { </span><span> </span><span style="color:#569cd6;">unsafe fn </span><span>new() -&gt; ActivePageTable { </span><span> ActivePageTable { </span><span> mapper: Mapper::new(), </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>with&lt;F&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> table: </span><span style="color:#569cd6;">&amp;mut</span><span> InactivePageTable, </span><span> f: F) </span><span> </span><span style="color:#569cd6;">where</span><span> F: FnOnce(</span><span style="color:#569cd6;">&amp;mut</span><span> Mapper) </span><span style="color:#608b4e;">// `Mapper` instead of `ActivePageTable` </span><span> {</span><span style="color:#569cd6;">...</span><span>} </span><span>} </span></code></pre> <p>The <a href="https://doc.rust-lang.org/nightly/core/ops/trait.Deref.html">Deref</a> and <a href="https://doc.rust-lang.org/nightly/core/ops/trait.DerefMut.html">DerefMut</a> implementations allow us to use the <code>ActivePageTable</code> exactly as before, for example we still can call <code>map_to</code> on it (because of <a href="https://doc.rust-lang.org/nightly/book/deref-coercions.html">deref coercions</a>). But the closure called in the <code>with</code> function can no longer invoke <code>with</code> again. The reason is that we changed the type of the generic <code>F</code> parameter a bit: Instead of an <code>ActivePageTable</code>, the closure just gets a <code>Mapper</code> as argument.</p> <h3 id="restoring-the-recursive-mapping"><a class="zola-anchor" href="#restoring-the-recursive-mapping" aria-label="Anchor link for: restoring-the-recursive-mapping">🔗</a>Restoring the Recursive Mapping</h3> <p>Right now, the <code>with</code> function overwrites the recursive mapping and calls the closure. But it does not restore the previous recursive mapping yet. So let’s fix that!</p> <p>To backup the physical P4 frame of the active table, we can either read it from the 511th P4 entry (before we change it) or from the CR3 control register directly. We will do the latter as it should be faster and we already have a external crate that makes it easy:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>x86_64::registers::control_regs; </span><span style="color:#569cd6;">let</span><span> backup = Frame::containing_address( </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ control_regs::cr3() } </span><span style="color:#569cd6;">as usize </span><span>); </span></code></pre> <p>Why is it unsafe? Because reading the CR3 register leads to a CPU exception if the processor is not running in kernel mode (<a href="https://wiki.osdev.org/Security#Low-level_Protection_Mechanisms">Ring 0</a>). But this code will always run in kernel mode, so the <code>unsafe</code> block is completely safe here.</p> <p>Now that we have a backup of the original P4 frame, we need a way to restore it after the closure has run. So we need to somehow modify the 511th entry of the original P4 frame, which is still the active table in the CPU. But we can’t access it because the recursive mapping now points to the inactive table:</p> <p><img src="https://os.phil-opp.com/remap-the-kernel/recursive_mapping_inactive_table_scheme.svg" alt="it’s not possible to access the original P4 through recursive mapping anymore" /></p> <p>It’s just not possible to access the active P4 entry in 4 steps, so we can’t reach it through the 4-level page table.</p> <p>We could try to overwrite the recursive mapping of the <em>inactive</em> P4 table and point it back to the original P4 frame:</p> <p><img src="https://os.phil-opp.com/remap-the-kernel/cyclic_mapping_inactive_tables.svg" alt="cyclic map active and inactive P4 tables" /></p> <p>Now we can reach the active P4 entry in 4 steps and could restore the original mapping in the active table. But this hack has a drawback: The inactive table is now invalid since it is no longer recursive mapped. We would need to fix it by using a temporary page again (as above).</p> <p>But if we need a temporary page anyway, we can just use it to map the original P4 frame directly. Thus we avoid the above hack and make the code simpler. So let’s do it that way.</p> <h3 id="completing-the-implementation"><a class="zola-anchor" href="#completing-the-implementation" aria-label="Anchor link for: completing-the-implementation">🔗</a>Completing the Implementation</h3> <p>The <code>with</code> method gets an additional <code>TemporaryPage</code> argument, which we use to backup and restore the original recursive mapping:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>with&lt;F&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> table: </span><span style="color:#569cd6;">&amp;mut</span><span> InactivePageTable, </span><span> temporary_page: </span><span style="color:#569cd6;">&amp;mut </span><span>temporary_page::TemporaryPage, </span><span style="color:#608b4e;">// new </span><span> f: F) </span><span> </span><span style="color:#569cd6;">where</span><span> F: FnOnce(</span><span style="color:#569cd6;">&amp;mut</span><span> Mapper) </span><span>{ </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::tlb; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control_regs; </span><span> </span><span> { </span><span> </span><span style="color:#569cd6;">let</span><span> backup = Frame::containing_address( </span><span> control_regs::cr3().</span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">as usize</span><span>); </span><span> </span><span> </span><span style="color:#608b4e;">// map temporary_page to current p4 table </span><span> </span><span style="color:#569cd6;">let</span><span> p4_table = temporary_page.map_table_frame(backup.clone(), self); </span><span> </span><span> </span><span style="color:#608b4e;">// overwrite recursive mapping </span><span> self.p4_mut()[</span><span style="color:#b5cea8;">511</span><span>].set(table.p4_frame.clone(), </span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">WRITABLE</span><span>); </span><span> tlb::flush_all(); </span><span> </span><span> </span><span style="color:#608b4e;">// execute f in the new context </span><span> f(self); </span><span> </span><span> </span><span style="color:#608b4e;">// restore recursive mapping to original p4 table </span><span> p4_table[</span><span style="color:#b5cea8;">511</span><span>].set(backup, </span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">WRITABLE</span><span>); </span><span> tlb::flush_all(); </span><span> } </span><span> </span><span> temporary_page.unmap(self); </span><span>} </span></code></pre> <p>Again, the inner scope is needed to end the borrow of <code>temporary_page</code> so that we can unmap it again. Note that we need to flush the TLB another time after we restored the original recursive mapping.</p> <p>Now the <code>with</code> function is ready to be used!</p> <h2 id="remapping-the-kernel"><a class="zola-anchor" href="#remapping-the-kernel" aria-label="Anchor link for: remapping-the-kernel">🔗</a>Remapping the Kernel</h2> <p>Let’s tackle the main task of this post: remapping the kernel sections. Therefor we create a <code>remap_the_kernel</code> function in <code>memory/paging/mod.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>multiboot2::BootInformation; </span><span style="color:#569cd6;">use </span><span>memory::{</span><span style="color:#b4cea8;">PAGE_SIZE</span><span>, Frame, FrameAllocator}; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>remap_the_kernel&lt;A&gt;(allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A, boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> temporary_page = TemporaryPage::new(Page { number: </span><span style="color:#b5cea8;">0xcafebabe </span><span>}, </span><span> allocator); </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> active_table = </span><span style="color:#569cd6;">unsafe </span><span>{ ActivePageTable::new() }; </span><span> </span><span style="color:#569cd6;">let mut</span><span> new_table = { </span><span> </span><span style="color:#569cd6;">let</span><span> frame = allocator.allocate_frame().expect(</span><span style="color:#d69d85;">&quot;no more frames&quot;</span><span>); </span><span> InactivePageTable::new(frame, </span><span style="color:#569cd6;">&amp;mut</span><span> active_table, </span><span style="color:#569cd6;">&amp;mut</span><span> temporary_page) </span><span> }; </span><span> </span><span> active_table.with(</span><span style="color:#569cd6;">&amp;mut</span><span> new_table, </span><span style="color:#569cd6;">&amp;mut</span><span> temporary_page, |mapper| { </span><span> </span><span style="color:#569cd6;">let</span><span> elf_sections_tag = boot_info.elf_sections_tag() </span><span> .expect(</span><span style="color:#d69d85;">&quot;Memory map tag required&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">for</span><span> section </span><span style="color:#569cd6;">in</span><span> elf_sections_tag.sections() { </span><span> </span><span style="color:#608b4e;">// TODO mapper.identity_map() all pages of `section` </span><span> } </span><span> }); </span><span>} </span></code></pre> <p>First, we create a temporary page at page number <code>0xcafebabe</code>. We could use <code>0xdeadbeaf</code> or <code>0x123456789</code> as well, as long as the page is unused. The <code>active_table</code> and the <code>new_table</code> are created using their constructor functions.</p> <p>Then we use the <code>with</code> function to temporary change the recursive mapping and execute the closure as if the <code>new_table</code> were active. This allows us to map the sections in the new table without changing the active mapping. To get the kernel sections, we use the <a href="https://os.phil-opp.com/allocating-frames/#the-multiboot-information-structure">Multiboot information structure</a>.</p> <p>Let’s resolve the above <code>TODO</code> by identity mapping the sections:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">for</span><span> section </span><span style="color:#569cd6;">in</span><span> elf_sections_tag.sections() { </span><span> </span><span style="color:#569cd6;">use </span><span>self::entry::</span><span style="color:#b4cea8;">WRITABLE</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">if !</span><span>section.is_allocated() { </span><span> </span><span style="color:#608b4e;">// section is not loaded to memory </span><span> </span><span style="color:#569cd6;">continue</span><span>; </span><span> } </span><span> assert!(section.start_address() % </span><span style="color:#b4cea8;">PAGE_SIZE </span><span>== </span><span style="color:#b5cea8;">0</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;sections need to be page aligned&quot;</span><span>); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;mapping section at addr: </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">, size: </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> section.addr, section.size); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> flags = </span><span style="color:#b4cea8;">WRITABLE</span><span>; </span><span style="color:#608b4e;">// TODO use real section flags </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> start_frame = Frame::containing_address(section.start_address()); </span><span> </span><span style="color:#569cd6;">let</span><span> end_frame = Frame::containing_address(section.end_address() - </span><span style="color:#b5cea8;">1</span><span>); </span><span> </span><span style="color:#569cd6;">for</span><span> frame </span><span style="color:#569cd6;">in </span><span>Frame::range_inclusive(start_frame, end_frame) { </span><span> mapper.identity_map(frame, flags, allocator); </span><span> } </span><span>} </span></code></pre> <p>We skip all sections that were not loaded into memory (e.g. debug sections). We require that all sections are page aligned because a page must not contain sections with different flags. For example, we would need to set the <code>EXECUTABLE</code> and <code>WRITABLE</code> flags for a page that contains parts of the <code>.code</code> and <code>.data</code> section. Thus we could modify the running code or execute bytes from the <code>.data</code> section as code.</p> <p>To map a section, we iterate over all of its frames of a section by using a new <code>Frame::range_inclusive</code> function (shown below). Note that the end address is exclusive, so that it’s not part of the section anymore (it’s the first byte of the next section). Thus we need to subtract 1 to get the <code>end_frame</code>.</p> <p>The <code>Frame::range_inclusive</code> function looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span> </span><span style="color:#569cd6;">impl </span><span>Frame { </span><span> </span><span style="color:#569cd6;">fn </span><span>range_inclusive(start: Frame, end: Frame) -&gt; FrameIter { </span><span> FrameIter { </span><span> start: start, </span><span> end: end, </span><span> } </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">struct </span><span>FrameIter { </span><span> start: Frame, </span><span> end: Frame, </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>Iterator </span><span style="color:#569cd6;">for </span><span>FrameIter { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Item </span><span>= Frame; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>next(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;Frame&gt; { </span><span> </span><span style="color:#569cd6;">if </span><span>self.start &lt;= self.end { </span><span> </span><span style="color:#569cd6;">let</span><span> frame = self.start.clone(); </span><span> self.start.number += </span><span style="color:#b5cea8;">1</span><span>; </span><span> Some(frame) </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> None </span><span> } </span><span> } </span><span> } </span></code></pre> <p>Instead of creating a custom iterator, we could have used the <a href="https://doc.rust-lang.org/nightly/core/ops/struct.Range.html">Range</a> struct of the standard library. But it requires that we implement the <a href="https://doc.rust-lang.org/1.10.0/core/num/trait.One.html">One</a> and <a href="https://doc.rust-lang.org/nightly/core/ops/trait.Add.html">Add</a> traits for <code>Frame</code>. Then every module could perform arithmetic operations on frames, for example <code>let frame3 = frame1 + frame2</code>. This would violate our safety invariants because <code>frame3</code> could be already in use. The <code>range_inclusive</code> function does not have these problems because it is only available inside the <code>memory</code> module.</p> <h3 id="page-align-sections"><a class="zola-anchor" href="#page-align-sections" aria-label="Anchor link for: page-align-sections">🔗</a>Page Align Sections</h3> <p>Right now our sections aren’t page aligned, so the assertion in <code>remap_the_kernel</code> would fail. We can fix this by making the section size a multiple of the page size. To do this, we add an <code>ALIGN</code> statement to all sections in the linker file. For example:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>SECTIONS { </span><span> . = 1M; </span><span> </span><span> .text : </span><span> { </span><span> *(.text .text.*) </span><span> . = ALIGN(4K); </span><span> } </span><span>} </span></code></pre> <p>The <code>.</code> is the “current location counter” and represents the current virtual address. At the beginning of the <code>SECTIONS</code> tag we set it to <code>1M</code>, so our kernel starts at 1MiB. We use the <a href="https://www.math.utah.edu/docs/info/ld_3.html#SEC12">ALIGN</a> function to align the current location counter to the next <code>4K</code> boundary (<code>4K</code> is the page size). Thus the end of the <code>.text</code> section – and the beginning of the next section – are page aligned.</p> <p>To put all sections on their own page, we add the <code>ALIGN</code> statement to all of them:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>/* src/arch/x86_64/linker.ld */ </span><span> </span><span>ENTRY(start) </span><span> </span><span>SECTIONS { </span><span> . = 1M; </span><span> </span><span> .rodata : </span><span> { </span><span> /* ensure that the multiboot header is at the beginning */ </span><span> KEEP(*(.multiboot_header)) </span><span> *(.rodata .rodata.*) </span><span> . = ALIGN(4K); </span><span> } </span><span> </span><span> .text : </span><span> { </span><span> *(.text .text.*) </span><span> . = ALIGN(4K); </span><span> } </span><span> </span><span> .data : </span><span> { </span><span> *(.data .data.*) </span><span> . = ALIGN(4K); </span><span> } </span><span> </span><span> .bss : </span><span> { </span><span> *(.bss .bss.*) </span><span> . = ALIGN(4K); </span><span> } </span><span> </span><span> .got : </span><span> { </span><span> *(.got) </span><span> . = ALIGN(4K); </span><span> } </span><span> </span><span> .got.plt : </span><span> { </span><span> *(.got.plt) </span><span> . = ALIGN(4K); </span><span> } </span><span> </span><span> .data.rel.ro : ALIGN(4K) { </span><span> *(.data.rel.ro.local*) *(.data.rel.ro .data.rel.ro.*) </span><span> . = ALIGN(4K); </span><span> } </span><span> </span><span> .gcc_except_table : ALIGN(4K) { </span><span> *(.gcc_except_table) </span><span> . = ALIGN(4K); </span><span> } </span><span>} </span></code></pre> <p>Instead of page aligning the <code>.multiboot_header</code> section, we merge it into the <code>.rodata</code> section. That way, we don’t waste a whole page for the few bytes of the Multiboot header. We could merge it into any section, but <code>.rodata</code> fits best because it has the same flags (neither writable nor executable). The Multiboot header still needs to be at the beginning of the file, so <code>.rodata</code> must be our first section now.</p> <h3 id="testing-it"><a class="zola-anchor" href="#testing-it" aria-label="Anchor link for: testing-it">🔗</a>Testing it</h3> <p>Time to test it! We re-export the <code>remap_the_kernel</code> function from the memory module and call it from <code>rust_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/mod.rs </span><span style="color:#569cd6;">pub use </span><span>self::paging::remap_the_kernel; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#608b4e;">// ATTENTION: we have a very small stack and no guard page </span><span> </span><span> </span><span style="color:#608b4e;">// the same as before </span><span> vga_buffer::clear_screen(); </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> boot_info = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> multiboot2::load(multiboot_information_address) </span><span> }; </span><span> </span><span style="color:#569cd6;">let</span><span> memory_map_tag = boot_info.memory_map_tag() </span><span> .expect(</span><span style="color:#d69d85;">&quot;Memory map tag required&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> elf_sections_tag = boot_info.elf_sections_tag() </span><span> .expect(</span><span style="color:#d69d85;">&quot;Elf sections tag required&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> kernel_start = elf_sections_tag.sections().map(|s| s.addr) </span><span> .min().unwrap(); </span><span> </span><span style="color:#569cd6;">let</span><span> kernel_end = elf_sections_tag.sections().map(|s| s.addr + s.size) </span><span> .max().unwrap(); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> multiboot_start = multiboot_information_address; </span><span> </span><span style="color:#569cd6;">let</span><span> multiboot_end = multiboot_start + (boot_info.total_size </span><span style="color:#569cd6;">as usize</span><span>); </span><span> </span><span> println!(</span><span style="color:#d69d85;">&quot;kernel start: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">, kernel end: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> kernel_start, kernel_end); </span><span> println!(</span><span style="color:#d69d85;">&quot;multiboot start: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">, multiboot end: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> multiboot_start, multiboot_end); </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = memory::AreaFrameAllocator::new( </span><span> kernel_start </span><span style="color:#569cd6;">as usize</span><span>, kernel_end </span><span style="color:#569cd6;">as usize</span><span>, multiboot_start, </span><span> multiboot_end, memory_map_tag.memory_areas()); </span><span> </span><span> </span><span style="color:#608b4e;">// this is the new part </span><span> memory::remap_the_kernel(</span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator, boot_info); </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">loop </span><span>{} </span><span>} </span></code></pre> <p>If you see the <code>It did not crash</code> message, the kernel survived our page table modifications without causing a CPU exception. But did we map the kernel sections correctly?</p> <p>Let’s try it out by switching to the new table! We identity map all kernel sections, so it should work without problems.</p> <h2 id="switching-tables"><a class="zola-anchor" href="#switching-tables" aria-label="Anchor link for: switching-tables">🔗</a>Switching Tables</h2> <p>Switching tables is easy. We just need to reload the <code>CR3</code> register with the physical address of the new P4 frame.</p> <p>We do this in a new <code>ActivePageTable::switch</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `impl ActivePageTable` in src/memory/paging/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>switch(</span><span style="color:#569cd6;">&amp;mut </span><span>self, new_table: InactivePageTable) -&gt; InactivePageTable { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::PhysicalAddress; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control_regs; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> old_table = InactivePageTable { </span><span> p4_frame: Frame::containing_address( </span><span> control_regs::cr3().</span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">as usize </span><span> ), </span><span> }; </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> control_regs::cr3_write(PhysicalAddress( </span><span> new_table.p4_frame.start_address() </span><span style="color:#569cd6;">as u64</span><span>)); </span><span> } </span><span> old_table </span><span>} </span></code></pre> <p>This function activates the given inactive table and returns the previous active table as a <code>InactivePageTable</code>. We don’t need to flush the TLB here, as the CPU does it automatically when the P4 table is switched. In fact, the <code>tlb::flush_all</code> function, which we used above, does nothing more than <a href="https://docs.rs/x86_64/0.1.2/src/x86_64/instructions/tlb.rs.html#11-14">reloading the CR3 register</a>.</p> <p>Now we are finally able to switch to the new table. We do it by adding the following lines to our <code>remap_the_kernel</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in remap_the_kernel in src/memory/paging/mod.rs </span><span> </span><span style="color:#569cd6;">... </span><span>active_table.with(</span><span style="color:#569cd6;">&amp;mut</span><span> new_table, </span><span style="color:#569cd6;">&amp;mut</span><span> temporary_page, |mapper| { </span><span> </span><span style="color:#569cd6;">... </span><span>}); </span><span> </span><span style="color:#569cd6;">let</span><span> old_table = active_table.switch(new_table); </span><span>println!(</span><span style="color:#d69d85;">&quot;NEW TABLE!!!&quot;</span><span>); </span></code></pre> <p>Let’s cross our fingers and run it…</p> <p>… and it fails with a boot loop.</p> <h3 id="debugging"><a class="zola-anchor" href="#debugging" aria-label="Anchor link for: debugging">🔗</a>Debugging</h3> <p>A QEMU boot loop indicates that some CPU exception occurred. We can see all thrown CPU exception by starting QEMU with <code>-d int</code>:</p> <pre data-lang="bash" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-bash "><code class="language-bash" data-lang="bash"><span>&gt; qemu-system-x86_64 -d int -no-reboot -cdrom build/os-x86_64.iso </span><span>... </span><span>check_exception old: 0xffffffff new 0xe </span><span> 0: v=0e e=0002 i=0 cpl=0 IP=0008:000000000010ab97 pc=000000000010ab97 </span><span> SP=</span><span style="background-color:#282828;color:#d69d85;">0010:00000000001182d0</span><span> CR2=</span><span style="background-color:#282828;color:#d69d85;">00000000000b8f00</span><span> </span><span>... </span></code></pre> <p>These lines are the important ones. We can read many useful information from them:</p> <ul> <li> <p><code>v=0e</code>: An exception with number <code>0xe</code> occurred, which is a page fault according to the <a href="https://wiki.osdev.org/Exceptions">OSDev Wiki</a>.</p> </li> <li> <p><code>e=0002</code>: The CPU set an <a href="https://wiki.osdev.org/Exceptions#Error_code">error code</a>, which tells us why the exception occurred. The <code>0x2</code> bit tells us that it was caused by a write operation. And since the <code>0x1</code> bit is not set, the target page was not present.</p> </li> <li> <p><code>IP=0008:000000000010ab97</code> or <code>pc=000000000010ab97</code>: The program counter register tells us that the exception occurred when the CPU tried to execute the instruction at <code>0x10ab97</code>. We can disassemble this address to see the corresponding function. The <code>0008:</code> prefix in <code>IP</code> indicates the code <a href="https://os.phil-opp.com/entering-longmode/#loading-the-gdt">GDT segment</a>.</p> </li> <li> <p><code>SP=0010:00000000001182d0</code>: The stack pointer was <code>0x1182d0</code> (the <code>0010:</code> prefix indicates the data <a href="https://os.phil-opp.com/entering-longmode/#loading-the-gdt">GDT segment</a>). This tells us if it the stack overflowed.</p> </li> <li> <p><code>CR2=00000000000b8f00</code>: Finally the most useful register. It tells us which virtual address caused the page fault. In our case it’s <code>0xb8f00</code>, which is part of the <a href="https://os.phil-opp.com/printing-to-screen/#the-vga-text-buffer">VGA text buffer</a>.</p> </li> </ul> <p>So let’s find out which function caused the exception:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>objdump -d build/kernel-x86_64.bin | grep -B100 &quot;10ab97&quot; </span></code></pre> <p>We disassemble our kernel and search for <code>10ab97</code>. The <code>-B100</code> option prints the 100 preceding lines too. The output tells us the responsible function:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>... </span><span>000000000010aa80 &lt;_ZN10vga_buffer6Writer10write_byte20h4601f5e405b6e89facaE&gt;: </span><span> 10aa80: 55 push %rbp </span><span> ... </span><span> 10ab93: 66 8b 55 aa mov -0x56(%rbp),%dx </span><span> 10ab97: 66 89 14 48 mov %dx,(%rax,%rcx,2) </span></code></pre> <p>The reason for the cryptical function name is Rust’s <a href="https://en.wikipedia.org/wiki/Name_mangling">name mangling</a>. But we can identity the <code>vga_buffer::Writer::write_byte</code> function nonetheless.</p> <p>So the reason for the page fault is that the <code>write_byte</code> function tried to write to the VGA text buffer at <code>0xb8f00</code>. Of course this provokes a page fault: We forgot to identity map the VGA buffer in the new page table.</p> <p>The fix is pretty simple:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>remap_the_kernel&lt;A&gt;(allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A, boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">... </span><span> active_table.with(</span><span style="color:#569cd6;">&amp;mut</span><span> new_table, </span><span style="color:#569cd6;">&amp;mut</span><span> temporary_page, |mapper| { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">for</span><span> section </span><span style="color:#569cd6;">in</span><span> elf_sections_tag.sections() { </span><span> </span><span style="color:#569cd6;">... </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// identity map the VGA text buffer </span><span> </span><span style="color:#569cd6;">let</span><span> vga_buffer_frame = Frame::containing_address(</span><span style="color:#b5cea8;">0xb8000</span><span>); </span><span style="color:#608b4e;">// new </span><span> mapper.identity_map(vga_buffer_frame, </span><span style="color:#b4cea8;">WRITABLE</span><span>, allocator); </span><span style="color:#608b4e;">// new </span><span> }); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> old_table = active_table.switch(new_table); </span><span> println!(</span><span style="color:#d69d85;">&quot;NEW TABLE!!!&quot;</span><span>); </span><span>} </span></code></pre> <p>Now we should see the <code>NEW TABLE!!!</code> message (and also the <code>It did not crash!</code> line again). Congratulations! We successfully switched our kernel to a new page table!</p> <h3 id="fixing-the-frame-allocator"><a class="zola-anchor" href="#fixing-the-frame-allocator" aria-label="Anchor link for: fixing-the-frame-allocator">🔗</a>Fixing the Frame Allocator</h3> <p>The same problem as above occurs when we try to use our <a href="https://os.phil-opp.com/allocating-frames/#the-allocator">AreaFrameAllocator</a> again. Try to add the following to <code>rust_main</code> after switching to the new table:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">pub extern </span><span style="color:#d69d85;">&quot;C&quot; </span><span style="color:#569cd6;">fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">... </span><span> memory::remap_the_kernel(</span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator, boot_info); </span><span> frame_allocator.allocate_frame(); </span><span style="color:#608b4e;">// new: try to allocate a frame </span><span> println!(</span><span style="color:#d69d85;">&quot;It did not crash!&quot;</span><span>); </span></code></pre> <p>This causes the same bootloop as above. The reason is that the <code>AreaFrameAllocator</code> uses the memory map of the Multiboot information structure. But we did not map the Multiboot structure, so it causes a page fault. To fix it, we identity map it as well:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `remap_the_kernel` in src/memory/paging/mod.rs </span><span>active_table.with(</span><span style="color:#569cd6;">&amp;mut</span><span> new_table, </span><span style="color:#569cd6;">&amp;mut</span><span> temporary_page, |mapper| { </span><span> </span><span> </span><span style="color:#608b4e;">// … identity map the allocated kernel sections </span><span> </span><span style="color:#608b4e;">// … identity map the VGA text buffer </span><span> </span><span> </span><span style="color:#608b4e;">// new: </span><span> </span><span style="color:#608b4e;">// identity map the multiboot info structure </span><span> </span><span style="color:#569cd6;">let</span><span> multiboot_start = Frame::containing_address(boot_info.start_address()); </span><span> </span><span style="color:#569cd6;">let</span><span> multiboot_end = Frame::containing_address(boot_info.end_address() - </span><span style="color:#b5cea8;">1</span><span>); </span><span> </span><span style="color:#569cd6;">for</span><span> frame </span><span style="color:#569cd6;">in </span><span>Frame::range_inclusive(multiboot_start, multiboot_end) { </span><span> mapper.identity_map(frame, </span><span style="color:#b4cea8;">PRESENT</span><span>, allocator); </span><span> } </span><span>}); </span></code></pre> <p>Normally the multiboot struct fits on one page. But GRUB can place it anywhere, so it could randomly cross a page boundary. Therefore we use <code>range_inclusive</code> to be on the safe side. Note that we need to subtract 1 to get the address of the last byte because the end address is exclusive.</p> <p>Now we should be able to allocate frames again.</p> <h2 id="using-the-correct-flags"><a class="zola-anchor" href="#using-the-correct-flags" aria-label="Anchor link for: using-the-correct-flags">🔗</a>Using the Correct Flags</h2> <p>Right now, our new table maps all kernel sections as writable and executable. To fix this, we add a <code>EntryFlags::from_elf_section_flags</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/entry.rs </span><span> </span><span style="color:#569cd6;">use </span><span>multiboot2::ElfSection; </span><span> </span><span style="color:#569cd6;">impl </span><span>EntryFlags { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>from_elf_section_flags(section: </span><span style="color:#569cd6;">&amp;</span><span>ElfSection) -&gt; EntryFlags { </span><span> </span><span style="color:#569cd6;">use </span><span>multiboot2::{</span><span style="color:#b4cea8;">ELF_SECTION_ALLOCATED</span><span>, </span><span style="color:#b4cea8;">ELF_SECTION_WRITABLE</span><span>, </span><span> </span><span style="color:#b4cea8;">ELF_SECTION_EXECUTABLE</span><span>}; </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> flags = EntryFlags::empty(); </span><span> </span><span> </span><span style="color:#569cd6;">if</span><span> section.flags().contains(</span><span style="color:#b4cea8;">ELF_SECTION_ALLOCATED</span><span>) { </span><span> </span><span style="color:#608b4e;">// section is loaded to memory </span><span> flags = flags </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">PRESENT</span><span>; </span><span> } </span><span> </span><span style="color:#569cd6;">if</span><span> section.flags().contains(</span><span style="color:#b4cea8;">ELF_SECTION_WRITABLE</span><span>) { </span><span> flags = flags </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">WRITABLE</span><span>; </span><span> } </span><span> </span><span style="color:#569cd6;">if !</span><span>section.flags().contains(</span><span style="color:#b4cea8;">ELF_SECTION_EXECUTABLE</span><span>) { </span><span> flags = flags </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">NO_EXECUTE</span><span>; </span><span> } </span><span> </span><span> flags </span><span> } </span><span>} </span></code></pre> <p>It just converts the ELF section flags to page table flags.</p> <p>Now we can use it to fix the <code>TODO</code> in our <code>remap_the_kernel</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span> </span><span style="color:#569cd6;">pub fn </span><span>remap_the_kernel&lt;A&gt;(allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A, boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">... </span><span> active_table.with(</span><span style="color:#569cd6;">&amp;mut</span><span> new_table, </span><span style="color:#569cd6;">&amp;mut</span><span> temporary_page, |mapper| { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">for</span><span> section </span><span style="color:#569cd6;">in</span><span> elf_sections_tag.sections() { </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">if !</span><span>section.is_allocated() { </span><span> </span><span style="color:#608b4e;">// section is not loaded to memory </span><span> </span><span style="color:#569cd6;">continue</span><span>; </span><span> } </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#608b4e;">// this is the new part </span><span> </span><span style="color:#569cd6;">let</span><span> flags = EntryFlags::from_elf_section_flags(section); </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">for</span><span> frame </span><span style="color:#569cd6;">in </span><span>Frame::range_inclusive(start_frame, end_frame) { </span><span> mapper.identity_map(frame, flags, allocator); </span><span> } </span><span> } </span><span> </span><span style="color:#569cd6;">... </span><span> }); </span><span> </span><span style="color:#569cd6;">... </span><span>} </span></code></pre> <p>But when we test it now, we get a page fault again. We can use the same technique as above to get the responsible function. I won’t bother you with the QEMU output and just tell you the results:</p> <p>This time the responsible function is <code>control_regs::cr3_write()</code> itself. From the <a href="https://wiki.osdev.org/Exceptions#Error_code">error code</a> we learn that it was a page protection violation and caused by “reading a 1 in a reserved field”. So the page table had some reserved bit set that should be always 0. It must be the <code>NO_EXECUTE</code> flag, since it’s the only new bit that we set in the page table.</p> <h3 id="the-nxe-bit"><a class="zola-anchor" href="#the-nxe-bit" aria-label="Anchor link for: the-nxe-bit">🔗</a>The NXE Bit</h3> <p>The reason is that the <code>NO_EXECUTE</code> bit must only be used when the <code>NXE</code> bit in the <a href="https://en.wikipedia.org/wiki/Control_register#EFER">Extended Feature Enable Register</a> (EFER) is set. That register is similar to Rust’s feature gating and can be used to enable all sorts of advanced CPU features. Since the <code>NXE</code> bit is off by default, we caused a page fault when we added the <code>NO_EXECUTE</code> bit to the page table.</p> <p>So we need to enable the <code>NXE</code> bit. For that we use the <a href="https://docs.rs/x86_64">x86_64 crate</a> again:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in lib.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>enable_nxe_bit() { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::msr::{</span><span style="color:#b4cea8;">IA32_EFER</span><span>, rdmsr, wrmsr}; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> nxe_bit = </span><span style="color:#b5cea8;">1 </span><span>&lt;&lt; </span><span style="color:#b5cea8;">11</span><span>; </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> efer = rdmsr(</span><span style="color:#b4cea8;">IA32_EFER</span><span>); </span><span> wrmsr(</span><span style="color:#b4cea8;">IA32_EFER</span><span>, efer </span><span style="color:#569cd6;">|</span><span> nxe_bit); </span><span> } </span><span>} </span></code></pre> <p>The unsafe block is needed since accessing the <code>EFER</code> register is only allowed in kernel mode. But we are in kernel mode, so everything is fine.</p> <p>When we call this function before calling <code>remap_the_kernel</code>, everything should work again.</p> <h3 id="the-write-protect-bit"><a class="zola-anchor" href="#the-write-protect-bit" aria-label="Anchor link for: the-write-protect-bit">🔗</a>The Write Protect Bit</h3> <p>Right now, we are still able to modify the <code>.code</code> and <code>.rodata</code> sections, even though we did not set the <code>WRITABLE</code> flag for them. The reason is that the CPU ignores this bit in kernel mode by default. To enable write protection for the kernel as well, we need to set the <em>Write Protect</em> bit in the <code>CR0</code> register:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in lib.rs </span><span> </span><span style="color:#569cd6;">fn </span><span>enable_write_protect_bit() { </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::registers::control_regs::{cr0, cr0_write, Cr0}; </span><span> </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ cr0_write(cr0() </span><span style="color:#569cd6;">| </span><span>Cr0::</span><span style="color:#b4cea8;">WRITE_PROTECT</span><span>) }; </span><span>} </span></code></pre> <p>The <code>cr0</code> functions are unsafe because accessing the <code>CR0</code> register is only allowed in kernel mode.</p> <p>If we haven’t forgotten to set the <code>WRITABLE</code> flag somewhere, it should still work without crashing.</p> <h2 id="creating-a-guard-page"><a class="zola-anchor" href="#creating-a-guard-page" aria-label="Anchor link for: creating-a-guard-page">🔗</a>Creating a Guard Page</h2> <p>The final step is to create a guard page for our kernel stack.</p> <p>The decision to place the kernel stack right above the page tables was already useful to detect a silent stack overflow in the <a href="https://os.phil-opp.com/page-tables/">previous post</a>. Now we profit from it again. Let’s look at our assembly <code>.bss</code> section again to understand why:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span style="color:#608b4e;">; in src/arch/x86_64/boot.asm </span><span> </span><span>section .bss </span><span>align </span><span style="color:#b4cea8;">4096 </span><span>p4_table: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>p3_table: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>p2_table: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>stack_bottom: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>* </span><span style="color:#b4cea8;">4 </span><span>stack_top: </span></code></pre> <p>The old page tables are right below the stack. They are still identity mapped since they are part of the kernel’s <code>.bss</code> section. We just need to turn the old <code>p4_table</code> into a guard page to secure the kernel stack. That way we even reuse the memory of the old P3 and P2 tables to increase the stack size.</p> <p>So let’s implement it:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/memory/paging/mod.rs </span><span style="color:#569cd6;">pub fn </span><span>remap_the_kernel&lt;A&gt;(allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A, boot_info: </span><span style="color:#569cd6;">&amp;</span><span>BootInformation) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">let</span><span> old_table = active_table.switch(new_table); </span><span> println!(</span><span style="color:#d69d85;">&quot;NEW TABLE!!!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#608b4e;">// below is the new part </span><span> </span><span> </span><span style="color:#608b4e;">// turn the old p4 page into a guard page </span><span> </span><span style="color:#569cd6;">let</span><span> old_p4_page = Page::containing_address( </span><span> old_table.p4_frame.start_address() </span><span> ); </span><span> active_table.unmap(old_p4_page, allocator); </span><span> println!(</span><span style="color:#d69d85;">&quot;guard page at </span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">&quot;</span><span>, old_p4_page.start_address()); </span><span>} </span></code></pre> <p>Now we have a very basic guard page: The page below the stack is unmapped, so a stack overflow causes an immediate page fault. Thus, silent stack overflows are no longer possible.</p> <p>Or to be precise, they are improbable. If we have a function with many big stack variables, it’s possible that the guard page is missed. For example, the following function could still corrupt memory below the stack:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>stack_overflow() { </span><span> </span><span style="color:#569cd6;">let</span><span> x = [</span><span style="color:#b5cea8;">0</span><span>; </span><span style="color:#b5cea8;">99999</span><span>]; </span><span>} </span></code></pre> <p>This creates a very big array on the stack, which is currently filled from bottom to top. Therefore it misses the guard page and overwrites some memory below the stack. Eventually it hits the bottom of the guard page and causes a page fault. But before, it messes up memory, which is bad.</p> <p>Fortunately, there exists a solution called <em>stack probes</em>. The basic idea is to check all required stack pages at the beginning of each function. For example, a function that needs 9000 bytes on the stack would try to access <code>SP + 0</code>, <code>SP + 4096</code>, and <code>SP + 2 * 4096</code> (<code>SP</code> is the stack pointer). If the stack is not big enough, the guard page is hit and a page fault occurs. The function can’t mess up memory anymore since the stack check occurs right at its start.</p> <p>Unfortunately stack probes require compiler support. They already work on Windows but they don’t exist on Linux yet. The problem seems to be in LLVM, which Rust uses as backend. Hopefully it gets resolved soon so that our kernel stack becomes safe. For the current status and more information about stack probes check out the <a href="https://github.com/rust-lang/rust/issues/16012#issuecomment-160380183">tracking issue</a>.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>Now that we have a (mostly) safe kernel stack and a working page table module, we can add a virtual memory allocator. The <a href="https://os.phil-opp.com/kernel-heap/">next post</a> will explore Rust’s allocator API and create a very basic allocator. At the end of that post, we will be able to use Rust’s allocation and collections types such as <a href="https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html">Box</a>, <a href="https://doc.rust-lang.org/1.10.0/collections/vec/struct.Vec.html">Vec</a>, or even <a href="https://doc.rust-lang.org/1.10.0/collections/btree_map/struct.BTreeMap.html">BTreeMap</a>.</p> <h2 id="footnotes"><a class="zola-anchor" href="#footnotes" aria-label="Anchor link for: footnotes">🔗</a>Footnotes</h2> <div class="footnote-definition" id="fn-debug-notes"><sup class="footnote-definition-label">1</sup> <p>For this post the most useful GDB command is probably <code>p/x *((long int*)0xfffffffffffff000)@512</code>. It prints all entries of the recursively mapped P4 table by interpreting it as an array of 512 long ints (the <code>@512</code> is GDB’s array syntax). Of course you can also print other tables by adjusting the address.</p> </div> Page Tables Wed, 09 Dec 2015 00:00:00 +0000 https://os.phil-opp.com/page-tables/ https://os.phil-opp.com/page-tables/ <p>In this post we will create a paging module, which allows us to access and modify the 4-level page table. We will explore recursive page table mapping and use some Rust features to make it safe. Finally we will create functions to translate virtual addresses and to map and unmap pages.</p> <span id="continue-reading"></span> <p>You can find the source code and this post itself on <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_6">GitHub</a>. Please file an issue there if you have any problems or improvement suggestions. There is also a comment section at the end of this page. Note that this post requires a current Rust nightly.</p> <h2 id="paging"><a class="zola-anchor" href="#paging" aria-label="Anchor link for: paging">🔗</a>Paging</h2> <p><em>Paging</em> is a memory management scheme that separates virtual and physical memory. The address space is split into equal sized <em>pages</em> and <em>page tables</em> specify which virtual page points to which physical frame. For an extensive paging introduction take a look at the paging chapter (<a href="http://pages.cs.wisc.edu/~remzi/OSTEP/vm-paging.pdf">PDF</a>) of the <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/">Three Easy Pieces</a> OS book.</p> <p>The x86 architecture uses a 4-level page table in 64-bit mode. A virtual address has the following structure:</p> <p><img src="https://os.phil-opp.com/page-tables/x86_address_structure.svg" alt="structure of a virtual address on x86" /></p> <p>The bits 48–63 are so-called <em>sign extension</em> bits and must be copies of bit 47. The following 36 bits define the page table indexes (9 bits per table) and the last 12 bits specify the offset in the 4KiB page.</p> <p>Each table has 2^9 = 512 entries and each entry is 8 byte. Thus a page table fits exactly in one page (4 KiB).</p> <p>To translate an address, the CPU reads the P4 address from the CR3 register. Then it uses the indexes to walk the tables:</p> <p><img src="https://os.phil-opp.com/page-tables/X86_Paging_64bit.svg" alt="translation of virtual to physical addresses in 64 bit mode" /></p> <p>The P4 entry points to a P3 table, where the next 9 bits of the address are used to select an entry. The P3 entry then points to a P2 table and the P2 entry points to a P1 table. The P1 entry, which is specified through bits 12–20, finally points to the physical frame.</p> <h2 id="a-basic-paging-module"><a class="zola-anchor" href="#a-basic-paging-module" aria-label="Anchor link for: a-basic-paging-module">🔗</a>A Basic Paging Module</h2> <p>Let’s create a basic paging module in <code>memory/paging/mod.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>memory::</span><span style="color:#b4cea8;">PAGE_SIZE</span><span>; </span><span style="color:#608b4e;">// needed later </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">ENTRY_COUNT</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">512</span><span>; </span><span> </span><span style="color:#569cd6;">pub type </span><span style="color:#4ec9b0;">PhysicalAddress </span><span>= </span><span style="color:#569cd6;">usize</span><span>; </span><span style="color:#569cd6;">pub type </span><span style="color:#4ec9b0;">VirtualAddress </span><span>= </span><span style="color:#569cd6;">usize</span><span>; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Page { </span><span> number: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span></code></pre> <p>We import the <code>PAGE_SIZE</code> and define a constant for the number of entries per table. To make future function signatures more expressive, we can use the type aliases <code>PhysicalAddress</code> and <code>VirtualAddress</code>. The <code>Page</code> struct is similar to the <code>Frame</code> struct in the <a href="https://os.phil-opp.com/allocating-frames/#a-memory-module">previous post</a>, but represents a virtual page instead of a physical frame.</p> <h3 id="page-table-entries"><a class="zola-anchor" href="#page-table-entries" aria-label="Anchor link for: page-table-entries">🔗</a>Page Table Entries</h3> <p>To model page table entries, we create a new <code>entry</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>memory::Frame; </span><span style="color:#608b4e;">// needed later </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Entry(</span><span style="color:#569cd6;">u64</span><span>); </span><span> </span><span style="color:#569cd6;">impl </span><span>Entry { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>is_unused(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">bool </span><span>{ </span><span> self.</span><span style="color:#b5cea8;">0 </span><span>== </span><span style="color:#b5cea8;">0 </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>set_unused(</span><span style="color:#569cd6;">&amp;mut </span><span>self) { </span><span> self.</span><span style="color:#b5cea8;">0 </span><span>= </span><span style="color:#b5cea8;">0</span><span>; </span><span> } </span><span>} </span></code></pre> <p>We define that an unused entry is completely 0. That allows us to distinguish unused entries from other non-present entries in the future. For example, we could define one of the available bits as the <code>swapped_out</code> bit for pages that are swapped to disk.</p> <p>Next we will model the contained physical address and the various flags. Remember, entries have the following format:</p> <table><thead><tr><th>Bit(s)</th><th>Name</th><th>Meaning</th></tr></thead><tbody> <tr><td>0</td><td>present</td><td>the page is currently in memory</td></tr> <tr><td>1</td><td>writable</td><td>it’s allowed to write to this page</td></tr> <tr><td>2</td><td>user accessible</td><td>if not set, only kernel mode code can access this page</td></tr> <tr><td>3</td><td>write through caching</td><td>writes go directly to memory</td></tr> <tr><td>4</td><td>disable cache</td><td>no cache is used for this page</td></tr> <tr><td>5</td><td>accessed</td><td>the CPU sets this bit when this page is used</td></tr> <tr><td>6</td><td>dirty</td><td>the CPU sets this bit when a write to this page occurs</td></tr> <tr><td>7</td><td>huge page/null</td><td>must be 0 in P1 and P4, creates a 1GiB page in P3, creates a 2MiB page in P2</td></tr> <tr><td>8</td><td>global</td><td>page isn’t flushed from caches on address space switch (PGE bit of CR4 register must be set)</td></tr> <tr><td>9-11</td><td>available</td><td>can be used freely by the OS</td></tr> <tr><td>12-51</td><td>physical address</td><td>the page aligned 52bit physical address of the frame or the next page table</td></tr> <tr><td>52-62</td><td>available</td><td>can be used freely by the OS</td></tr> <tr><td>63</td><td>no execute</td><td>forbid executing code on this page (the NXE bit in the EFER register must be set)</td></tr> </tbody></table> <p>To model the various flags, we will use the <a href="https://github.com/rust-lang-nursery/bitflags">bitflags</a> crate. To add it as a dependency, add the following to your <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#ff3333;">... </span><span style="color:#569cd6;">bitflags </span><span>= </span><span style="color:#d69d85;">&quot;0.9.1&quot; </span></code></pre> <p>To import the macro, we need to use <code>#[macro_use]</code> above the <code>extern crate</code> definition:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span>#[macro_use] </span><span style="color:#569cd6;">extern crate</span><span> bitflags; </span></code></pre> <p>Now we can model the various flags:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>bitflags! { </span><span> </span><span style="color:#569cd6;">pub struct </span><span>EntryFlags: u64 { </span><span> const PRESENT = 1 &lt;&lt; 0; </span><span> const WRITABLE = 1 &lt;&lt; 1; </span><span> const USER_ACCESSIBLE = 1 &lt;&lt; 2; </span><span> const WRITE_THROUGH = 1 &lt;&lt; 3; </span><span> const NO_CACHE = 1 &lt;&lt; 4; </span><span> const ACCESSED = 1 &lt;&lt; 5; </span><span> const DIRTY = 1 &lt;&lt; 6; </span><span> const HUGE_PAGE = 1 &lt;&lt; 7; </span><span> const GLOBAL = 1 &lt;&lt; 8; </span><span> const NO_EXECUTE = 1 &lt;&lt; 63; </span><span> } </span><span>} </span></code></pre> <p>To extract the flags from the entry we create an <code>Entry::flags</code> method that uses <a href="https://docs.rs/bitflags/0.9.1/bitflags/example_generated/struct.Flags.html#method.from_bits_truncate">from_bits_truncate</a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>flags(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; EntryFlags { </span><span> EntryFlags::from_bits_truncate(self.</span><span style="color:#b5cea8;">0</span><span>) </span><span>} </span></code></pre> <p>This allows us to check for flags through the <code>contains()</code> function. For example, <code>flags().contains(PRESENT | WRITABLE)</code> returns true if the entry contains <em>both</em> flags.</p> <p>To extract the physical address, we add a <code>pointed_frame</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>pointed_frame(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; Option&lt;Frame&gt; { </span><span> </span><span style="color:#569cd6;">if </span><span>self.flags().contains(</span><span style="color:#b4cea8;">PRESENT</span><span>) { </span><span> Some(Frame::containing_address( </span><span> self.</span><span style="color:#b5cea8;">0 </span><span style="color:#569cd6;">as usize &amp; </span><span style="color:#b5cea8;">0x000fffff_fffff000 </span><span> )) </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> None </span><span> } </span><span>} </span></code></pre> <p>If the entry is present, we mask bits 12–51 and return the corresponding frame. If the entry is not present, it does not point to a valid frame so we return <code>None</code>.</p> <p>To modify entries, we add a <code>set</code> method that updates the flags and the pointed frame:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>set(</span><span style="color:#569cd6;">&amp;mut </span><span>self, frame: Frame, flags: EntryFlags) { </span><span> assert!(frame.start_address() </span><span style="color:#569cd6;">&amp; !</span><span style="color:#b5cea8;">0x000fffff_fffff000 </span><span>== </span><span style="color:#b5cea8;">0</span><span>); </span><span> self.</span><span style="color:#b5cea8;">0 </span><span>= (frame.start_address() </span><span style="color:#569cd6;">as u64</span><span>) </span><span style="color:#569cd6;">|</span><span> flags.bits(); </span><span>} </span></code></pre> <p>The start address of a frame should be page aligned and smaller than 2^52 (since x86 uses 52bit physical addresses). Since an invalid address could mess up the entry, we add an assertion. To actually set the entry, we just need to <code>or</code> the start address and the flag bits.</p> <p>The missing <code>Frame::start_address</code> method is pretty simple:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>self::paging::PhysicalAddress; </span><span> </span><span style="color:#569cd6;">fn </span><span>start_address(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; PhysicalAddress { </span><span> self.number * </span><span style="color:#b4cea8;">PAGE_SIZE </span><span>} </span></code></pre> <p>We add it to the <code>impl Frame</code> block in <code>memory/mod.rs</code>.</p> <h3 id="page-tables"><a class="zola-anchor" href="#page-tables" aria-label="Anchor link for: page-tables">🔗</a>Page Tables</h3> <p>To model page tables, we create a basic <code>Table</code> struct in a new <code>table</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>memory::paging::entry::*; </span><span style="color:#569cd6;">use </span><span>memory::paging::</span><span style="color:#b4cea8;">ENTRY_COUNT</span><span>; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Table { </span><span> entries: [Entry; ENTRY_COUNT], </span><span>} </span></code></pre> <p>It’s just an array of 512 page table entries.</p> <p>To make the <code>Table</code> indexable itself, we can implement the <code>Index</code> and <code>IndexMut</code> traits:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>core::ops::{Index, IndexMut}; </span><span> </span><span style="color:#569cd6;">impl </span><span>Index&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; </span><span style="color:#569cd6;">for </span><span>Table { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">Output </span><span>= Entry; </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>index(</span><span style="color:#569cd6;">&amp;</span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">&amp;</span><span>Entry { </span><span> </span><span style="color:#569cd6;">&amp;</span><span>self.entries[index] </span><span> } </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>IndexMut&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; </span><span style="color:#569cd6;">for </span><span>Table { </span><span> </span><span style="color:#569cd6;">fn </span><span>index_mut(</span><span style="color:#569cd6;">&amp;mut </span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; </span><span style="color:#569cd6;">&amp;mut</span><span> Entry { </span><span> </span><span style="color:#569cd6;">&amp;mut </span><span>self.entries[index] </span><span> } </span><span>} </span></code></pre> <p>Now it’s possible to get the 42th entry through <code>some_table[42]</code>. Of course we could replace <code>usize</code> with <code>u32</code> or even <code>u16</code> here but it would cause more numerical conversions (<code>x as u16</code>).</p> <p>Let’s add a method that sets all entries to unused. We will need it when we create new page tables in the future. The method looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>zero(</span><span style="color:#569cd6;">&amp;mut </span><span>self) { </span><span> </span><span style="color:#569cd6;">for</span><span> entry </span><span style="color:#569cd6;">in </span><span>self.entries.iter_mut() { </span><span> entry.set_unused(); </span><span> } </span><span>} </span></code></pre> <p>Now we can read page tables and retrieve the mapping information. We can also update them through the <code>IndexMut</code> trait and the <code>Entry::set</code> method. But how do we get references to the various page tables?</p> <p>We could read the <code>CR3</code> register to get the physical address of the P4 table and read its entries to get the P3 addresses. The P3 entries then point to the P2 tables and so on. But this method only works for identity-mapped pages. In the future we will create new page tables, which aren’t in the identity-mapped area anymore. Since we can’t access them through their physical address, we need a way to map them to virtual addresses.</p> <h2 id="mapping-page-tables"><a class="zola-anchor" href="#mapping-page-tables" aria-label="Anchor link for: mapping-page-tables">🔗</a>Mapping Page Tables</h2> <p>So how do we map the page tables itself? We don’t have that problem for the current P4, P3, and P2 table since they are part of the identity-mapped area, but we need a way to access future tables, too.</p> <p>One solution is to identity map all page tables. That way we would not need to differentiate virtual and physical addresses and could easily access the tables. But it clutters the virtual address space and increases fragmentation. And it makes creating page tables much more complicated since we need a physical frame whose corresponding page isn’t already used for something else.</p> <p>An alternative solution is to map the page tables only temporary. To read/write a page table, we would map it to some free virtual address until we’re done. We could use a small pool of such virtual addresses and reuse them for various tables. This method occupies only few virtual addresses and thus is a good solution for 32-bit systems, which have small address spaces. But it makes things much more complicated since we need to temporary map up to 4 tables to access a single page. And the temporary mapping requires modification of other page tables, which need to be mapped, too.</p> <p>We will solve the problem in another way using a trick called <em>recursive mapping</em>.</p> <h3 id="recursive-mapping"><a class="zola-anchor" href="#recursive-mapping" aria-label="Anchor link for: recursive-mapping">🔗</a>Recursive Mapping</h3> <p>The trick is to map the P4 table recursively: The last entry doesn’t point to a P3 table, but to the P4 table itself. We can use this entry to remove a translation level so that we land on a page table instead. For example, we can “loop” once to access a P1 table:</p> <p><img src="https://os.phil-opp.com/page-tables/recursive_mapping_access_p1.svg" alt="access P1 table through recursive paging" /></p> <p>By selecting the 511th P4 entry, which points points to the P4 table itself, the P4 table is used as the P3 table. Similarly, the P3 table is used as a P2 table and the P2 table is treated like a P1 table. Thus the P1 table becomes the target page and can be accessed through the offset.</p> <p>It’s also possible to access P2 tables by looping twice. And if we select the 511th entry three times, we can access and modify P3 tables:</p> <p><img src="https://os.phil-opp.com/page-tables/recursive_mapping_access_p3.svg" alt="access P3 table through recursive paging" /></p> <p>So we just need to specify the desired P3 table in the address through the P1 index. By choosing the 511th entry multiple times, we stay on the P4 table until the address’s P1 index becomes the actual P4 index.</p> <p>To access the P4 table itself, we loop once more and thus never leave the frame:</p> <p><img src="https://os.phil-opp.com/page-tables/recursive_mapping_access_p4.svg" alt="access P4 table through recursive paging" /></p> <p>So we can access and modify page tables of all levels by just setting one P4 entry once. Most work is done by the CPU, we just the recursive entry to remove one or more translation levels. It may seem a bit strange at first, but it’s a clean and simple solution once you wrapped your head around it.</p> <p>By using recursive mapping, each page table is accessible through an unique virtual address. The math checks out, too: If all page tables are used, there is 1 P4 table, 511 P3 tables (the last entry is used for the recursive mapping), <code>511*512</code> P2 tables, and <code>511*512*512</code> P1 tables. So there are <code>134217728</code> page tables altogether. Each page table occupies 4KiB, so we need <code>134217728 * 4KiB = 512GiB</code> to store them. That’s exactly the amount of memory that can be accessed through one P4 entry since <code>4KiB per page * 512 P1 entries * 512 P2 entries * 512 P3 entries = 512GiB</code>.</p> <p>Of course recursive mapping has some disadvantages, too. It occupies a P4 entry and thus 512GiB of the virtual address space. But since we’re in long mode and have a 48-bit address space, there are still 225.5TiB left. The bigger problem is that only the active table can be modified by default. To access another table, the recursive entry needs to be replaced temporary. We will tackle this problem in the next post when we switch to a new page table.</p> <h3 id="implementation"><a class="zola-anchor" href="#implementation" aria-label="Anchor link for: implementation">🔗</a>Implementation</h3> <p>To map the P4 table recursively, we just need to point the 511th entry to the table itself. Of course we could do it in Rust, but it would require some fiddling with unsafe pointers. It’s easier to just add some lines to our boot assembly:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span style="color:#569cd6;">mov </span><span>eax, p4_table </span><span style="color:#569cd6;">or </span><span>eax, 0b11</span><span style="color:#608b4e;"> ; present + writable </span><span style="color:#569cd6;">mov </span><span>[p4_table + </span><span style="color:#b4cea8;">511 </span><span>* </span><span style="color:#b4cea8;">8</span><span>], eax </span></code></pre> <p>I put it right after the <code>set_up_page_tables</code> label, but you can add it wherever you like.</p> <p>Now we can use special virtual addresses to access the page tables. The P4 table is available at <code>0xfffffffffffff000</code>. Let’s add a P4 constant to the <code>table</code> submodule:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">P4</span><span>: </span><span style="color:#569cd6;">*mut</span><span> Table = </span><span style="color:#b5cea8;">0xffffffff_fffff000 </span><span style="color:#569cd6;">as *mut _</span><span>; </span></code></pre> <p>Let’s switch to the octal system, since it makes more sense for the other special addresses. The P4 address from above is equivalent to <code>0o177777_777_777_777_777_0000</code> in octal. You can see that is has index <code>777</code> in all tables and offset <code>0000</code>. The <code>177777</code> bits on the left are the sign extension bits, which are copies of the 47th bit. They are required because x86 only uses 48bit virtual addresses.</p> <p>The other tables can be accessed through the following addresses:</p> <table><thead><tr><th>Table</th><th>Address</th><th>Indexes</th></tr></thead><tbody> <tr><td>P4</td><td><code>0o177777_777_777_777_777_0000</code></td><td>–</td></tr> <tr><td>P3</td><td><code>0o177777_777_777_777_XXX_0000</code></td><td><code>XXX</code> is the P4 index</td></tr> <tr><td>P2</td><td><code>0o177777_777_777_XXX_YYY_0000</code></td><td>like above, and <code>YYY</code> is the P3 index</td></tr> <tr><td>P1</td><td><code>0o177777_777_XXX_YYY_ZZZ_0000</code></td><td>like above, and <code>ZZZ</code> is the P2 index</td></tr> </tbody></table> <p>If we look closely, we can see that the P3 address is equal to <code>(P4 &lt;&lt; 9) | XXX_0000</code>. And the P2 address is calculated through <code>(P3 &lt;&lt; 9) | YYY_0000</code>. So to get the next address, we need to shift it 9 bits to the left and add the table index. As a formula:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>next_table_address = (table_address &lt;&lt; 9) | (index &lt;&lt; 12) </span></code></pre> <h3 id="the-next-table-methods"><a class="zola-anchor" href="#the-next-table-methods" aria-label="Anchor link for: the-next-table-methods">🔗</a>The <code>next_table</code> Methods</h3> <p>Let’s add the above formula as a <code>Table</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>next_table_address(</span><span style="color:#569cd6;">&amp;</span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> entry_flags = self[index].flags(); </span><span> </span><span style="color:#569cd6;">if</span><span> entry_flags.contains(</span><span style="color:#b4cea8;">PRESENT</span><span>) </span><span style="color:#569cd6;">&amp;&amp; !</span><span>entry_flags.contains(</span><span style="color:#b4cea8;">HUGE_PAGE</span><span>) { </span><span> </span><span style="color:#569cd6;">let</span><span> table_address = self </span><span style="color:#569cd6;">as *const _ as usize</span><span>; </span><span> Some((table_address &lt;&lt; </span><span style="color:#b5cea8;">9</span><span>) </span><span style="color:#569cd6;">| </span><span>(index &lt;&lt; </span><span style="color:#b5cea8;">12</span><span>)) </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> None </span><span> } </span><span>} </span></code></pre> <p>The next table address is only valid if the corresponding entry is present and does not create a huge page. Then we can do some pointer casting to get the table address and use the formula to calculate the next address.</p> <p>If the index is out of bounds, the function will panic since Rust checks array bounds. The panic is desired here since a wrong index should not be possible and indicates a bug.</p> <p>To convert the address into references, we add two functions:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>next_table(</span><span style="color:#569cd6;">&amp;</span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;</span><span style="color:#569cd6;">&amp;</span><span>Table&gt; { </span><span> self.next_table_address(index) </span><span> .map(|address| </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*(address </span><span style="color:#569cd6;">as *const _</span><span>) }) </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>next_table_mut(</span><span style="color:#569cd6;">&amp;mut </span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;</span><span style="color:#569cd6;">&amp;mut</span><span> Table&gt; { </span><span> self.next_table_address(index) </span><span> .map(|address| </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*(address </span><span style="color:#569cd6;">as *mut _</span><span>) }) </span><span>} </span></code></pre> <p>We convert the address into raw pointers through <code>as</code> casts and then convert them into Rust references through <code>&amp;mut *</code>. The latter is an <code>unsafe</code> operation since Rust can’t guarantee that the raw pointer is valid.</p> <p>Note that <code>self</code> stays borrowed as long as the returned reference is valid. This is because of Rust’s <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/lifetimes.html#lifetime-elision">lifetime elision</a> rules. Basically, these rules say that the lifetime of an output reference is the same as the lifetime of the input reference by default. So the above function signatures are expanded to:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>next_table&lt;</span><span style="color:#569cd6;">&#39;a</span><span>&gt;(</span><span style="color:#569cd6;">&amp;&#39;a </span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;</span><span style="color:#569cd6;">&amp;&#39;a</span><span> Table&gt; {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>next_table_mut&lt;</span><span style="color:#569cd6;">&#39;a</span><span>&gt;(</span><span style="color:#569cd6;">&amp;&#39;a mut </span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) </span><span> -&gt; Option&lt;</span><span style="color:#569cd6;">&amp;&#39;a mut</span><span> Table&gt; </span><span>{</span><span style="color:#569cd6;">...</span><span>} </span></code></pre> <p>Note the additional lifetime parameters, which are identical for input and output references. That’s exactly what we want. It ensures that we can’t modify tables as long as we have references to lower tables. For example, it would be very bad if we could unmap a P3 table if we still write to one of its P2 tables.</p> <h4 id="safety"><a class="zola-anchor" href="#safety" aria-label="Anchor link for: safety">🔗</a>Safety</h4> <p>Now we can start at the <code>P4</code> constant and use the <code>next_table</code> functions to access the lower tables. And we don’t even need <code>unsafe</code> blocks to do it! Right now, your alarm bells should be ringing. Thanks to Rust, everything we’ve done before in this post was completely safe. But we just introduced two unsafe blocks to convince Rust that there are valid tables at the specified addresses. Can we really be sure?</p> <p>First, these addresses are only valid if the P4 table is mapped recursively. Since the paging module will be the only module that modifies page tables, we can introduce an invariant for the module:</p> <blockquote> <p><em>The 511th entry of the active P4 table must always be mapped to the active P4 table itself.</em></p> </blockquote> <p>So if we switch to another P4 table at some time, it needs to be identity mapped <em>before</em> it becomes active. As long as we obey this invariant, we can safely use the special addresses. But even with this invariant, there is a big problem with the two methods:</p> <p><em>What happens if we call them on a P1 table?</em></p> <p>Well, they would calculate the address of the next table (which does not exist) and treat it as a page table. Either they construct an invalid address (if <code>XXX &lt; 400</code>)<sup class="footnote-reference"><a href="#fn-invalid-address">1</a></sup> or access the mapped page itself. That way, we could easily corrupt memory or cause CPU exceptions by accident. So these two functions are not <em>safe</em> in Rust terms. Thus we need to make them <code>unsafe</code> functions unless we find some clever solution.</p> <h2 id="some-clever-solution"><a class="zola-anchor" href="#some-clever-solution" aria-label="Anchor link for: some-clever-solution">🔗</a>Some Clever Solution</h2> <p>We can use Rust’s type system to statically guarantee that the <code>next_table</code> methods can only be called on P4, P3, and P2 tables, but not on a P1 table. The idea is to add a <code>Level</code> parameter to the <code>Table</code> type and implement the <code>next_table</code> methods only for level 4, 3, and 2.</p> <p>To model the levels we use a trait and empty enums:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub trait </span><span>TableLevel {} </span><span> </span><span style="color:#569cd6;">pub enum </span><span>Level4 {} </span><span style="color:#569cd6;">pub enum </span><span>Level3 {} </span><span style="color:#569cd6;">pub enum </span><span>Level2 {} </span><span style="color:#569cd6;">pub enum </span><span>Level1 {} </span><span> </span><span style="color:#569cd6;">impl </span><span>TableLevel </span><span style="color:#569cd6;">for </span><span>Level4 {} </span><span style="color:#569cd6;">impl </span><span>TableLevel </span><span style="color:#569cd6;">for </span><span>Level3 {} </span><span style="color:#569cd6;">impl </span><span>TableLevel </span><span style="color:#569cd6;">for </span><span>Level2 {} </span><span style="color:#569cd6;">impl </span><span>TableLevel </span><span style="color:#569cd6;">for </span><span>Level1 {} </span></code></pre> <p>An empty enum has size zero and disappears completely after compiling. Unlike an empty struct, it’s not possible to instantiate an empty enum. Since we will use <code>TableLevel</code> and the table levels in exported types, they need to be public.</p> <p>To differentiate the P1 table from the other tables, we introduce a <code>HierarchicalLevel</code> trait, which is a subtrait of <code>TableLevel</code>. But we implement it only for the levels 4, 3, and 2:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub trait </span><span>HierarchicalLevel: TableLevel {} </span><span> </span><span style="color:#569cd6;">impl </span><span>HierarchicalLevel </span><span style="color:#569cd6;">for </span><span>Level4 {} </span><span style="color:#569cd6;">impl </span><span>HierarchicalLevel </span><span style="color:#569cd6;">for </span><span>Level3 {} </span><span style="color:#569cd6;">impl </span><span>HierarchicalLevel </span><span style="color:#569cd6;">for </span><span>Level2 {} </span></code></pre> <p>Now we add the level parameter to the <code>Table</code> type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>core::marker::PhantomData; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Table&lt;L: TableLevel&gt; { </span><span> entries: [Entry; ENTRY_COUNT], </span><span> level: PhantomData&lt;L&gt;, </span><span>} </span></code></pre> <p>We need to add a <a href="https://doc.rust-lang.org/core/marker/struct.PhantomData.html#unused-type-parameters">PhantomData</a> field because unused type parameters are not allowed in Rust.</p> <p>Since we changed the <code>Table</code> type, we need to update every use of it:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">P4</span><span>: </span><span style="color:#569cd6;">*mut </span><span>Table&lt;Level4&gt; = </span><span style="color:#b5cea8;">0xffffffff_fffff000 </span><span style="color:#569cd6;">as *mut _</span><span>; </span><span style="color:#569cd6;">... </span><span style="color:#569cd6;">impl</span><span>&lt;L&gt; Table&lt;L&gt; </span><span style="color:#569cd6;">where</span><span> L: TableLevel </span><span>{ </span><span> </span><span style="color:#569cd6;">pub fn </span><span>zero(</span><span style="color:#569cd6;">&amp;mut </span><span>self) {</span><span style="color:#569cd6;">...</span><span>} </span><span>} </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;L&gt; Table&lt;L&gt; </span><span style="color:#569cd6;">where</span><span> L: HierarchicalLevel </span><span>{ </span><span> </span><span style="color:#569cd6;">pub fn </span><span>next_table(</span><span style="color:#569cd6;">&amp;</span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;</span><span style="color:#569cd6;">&amp;</span><span>Table&lt;</span><span style="color:#ff3333;">??</span><span style="color:#569cd6;">?</span><span>&gt;&gt; {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>next_table_mut(</span><span style="color:#569cd6;">&amp;mut </span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;</span><span style="color:#569cd6;">&amp;mut </span><span>Table&lt;</span><span style="color:#ff3333;">??</span><span style="color:#569cd6;">?</span><span>&gt;&gt; </span><span> {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>next_table_address(</span><span style="color:#569cd6;">&amp;</span><span>self, index: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Option&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; {</span><span style="color:#569cd6;">...</span><span>} </span><span>} </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;L&gt; Index&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; </span><span style="color:#569cd6;">for </span><span>Table&lt;L&gt; </span><span style="color:#569cd6;">where</span><span> L: TableLevel {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span style="color:#569cd6;">impl</span><span>&lt;L&gt; IndexMut&lt;</span><span style="color:#569cd6;">usize</span><span>&gt; </span><span style="color:#569cd6;">for </span><span>Table&lt;L&gt; </span><span style="color:#569cd6;">where</span><span> L: TableLevel {</span><span style="color:#569cd6;">...</span><span>} </span></code></pre> <p>Now the <code>next_table</code> methods are only available for P4, P3, and P2 tables. But they have the incomplete return type <code>Table&lt;???&gt;</code> now. What should we fill in for the <code>???</code>?</p> <p>For a P4 table we would like to return a <code>Table&lt;Level3&gt;</code>, for a P3 table a <code>Table&lt;Level2&gt;</code>, and for a P2 table a <code>Table&lt;Level1&gt;</code>. So we want to return a table of the <em>next level</em>.</p> <p>We can define the next level by adding an associated type to the <code>HierarchicalLevel</code> trait:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">trait </span><span>HierarchicalLevel: TableLevel { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">NextLevel</span><span>: TableLevel; </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>HierarchicalLevel </span><span style="color:#569cd6;">for </span><span>Level4 { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">NextLevel </span><span>= Level3; </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>HierarchicalLevel </span><span style="color:#569cd6;">for </span><span>Level3 { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">NextLevel </span><span>= Level2; </span><span>} </span><span> </span><span style="color:#569cd6;">impl </span><span>HierarchicalLevel </span><span style="color:#569cd6;">for </span><span>Level2 { </span><span> </span><span style="color:#569cd6;">type </span><span style="color:#4ec9b0;">NextLevel </span><span>= Level1; </span><span>} </span></code></pre> <p>Now we can replace the <code>Table&lt;???&gt;</code> types with <code>Table&lt;L::NextLevel&gt;</code> types and our code works as intended. You can try it with a simple test function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>test() { </span><span> </span><span style="color:#569cd6;">let</span><span> p4 = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*</span><span style="color:#b4cea8;">P4 </span><span>}; </span><span> p4.next_table(</span><span style="color:#b5cea8;">42</span><span>) </span><span> .and_then(|p3| p3.next_table(</span><span style="color:#b5cea8;">1337</span><span>)) </span><span> .and_then(|p2| p2.next_table(</span><span style="color:#b5cea8;">0xdeadbeaf</span><span>)) </span><span> .and_then(|p1| p1.next_table(</span><span style="color:#b5cea8;">0xcafebabe</span><span>)) </span><span>} </span></code></pre> <p>Most of the indexes are completely out of bounds, so it would panic if it’s called. But we don’t need to call it since it already fails at compile time:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error: no method named `next_table` found for type </span><span> `&amp;memory::paging::table::Table&lt;memory::paging::table::Level1&gt;` </span><span> in the current scope </span></code></pre> <p>Remember that this is bare metal kernel code. We just used type system magic to make low-level page table manipulations safer. Rust is just awesome!</p> <h2 id="translating-addresses"><a class="zola-anchor" href="#translating-addresses" aria-label="Anchor link for: translating-addresses">🔗</a>Translating Addresses</h2> <p>Now let’s do something useful with our new module. We will create a function that translates a virtual address to the corresponding physical address. We add it to the <code>paging/mod.rs</code> module:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>translate(virtual_address: VirtualAddress) </span><span> -&gt; Option&lt;PhysicalAddress&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> offset = virtual_address % </span><span style="color:#b4cea8;">PAGE_SIZE</span><span>; </span><span> translate_page(Page::containing_address(virtual_address)) </span><span> .map(|frame| frame.number * </span><span style="color:#b4cea8;">PAGE_SIZE </span><span>+ offset) </span><span>} </span></code></pre> <p>It uses two functions we haven’t defined yet: <code>translate_page</code> and <code>Page::containing_address</code>. Let’s start with the latter:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>containing_address(address: VirtualAddress) -&gt; Page { </span><span> assert!(address &lt; </span><span style="color:#b5cea8;">0x0000_8000_0000_0000 </span><span style="color:#569cd6;">|| </span><span> address &gt;= </span><span style="color:#b5cea8;">0xffff_8000_0000_0000</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;invalid address: 0x{:x}&quot;</span><span>, address); </span><span> Page { number: address / </span><span style="color:#b4cea8;">PAGE_SIZE </span><span>} </span><span>} </span></code></pre> <p>The assertion is needed because there can be invalid addresses. Addresses on x86 are just 48-bit long and the other bits are just <em>sign extension</em>, i.e. a copy of the most significant bit. For example:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>invalid address: 0x0000_8000_0000_0000 </span><span>valid address: 0xffff_8000_0000_0000 </span><span> └── bit 47 </span></code></pre> <p>So the address space is split into two halves: the <em>higher half</em> containing addresses with sign extension and the <em>lower half</em> containing addresses without. Everything in between is invalid.</p> <p>Since we added <code>containing_address</code>, we add the inverse method as well (maybe we need it later):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>start_address(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> self.number * </span><span style="color:#b4cea8;">PAGE_SIZE </span><span>} </span></code></pre> <p>The other missing function, <code>translate_page</code>, looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>memory::Frame; </span><span> </span><span style="color:#569cd6;">fn </span><span>translate_page(page: Page) -&gt; Option&lt;Frame&gt; { </span><span> </span><span style="color:#569cd6;">use </span><span>self::entry::</span><span style="color:#b4cea8;">HUGE_PAGE</span><span>; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> p3 = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;</span><span>*table::</span><span style="color:#b4cea8;">P4 </span><span>}.next_table(page.p4_index()); </span><span> </span><span> </span><span style="color:#569cd6;">let </span><span>huge_page </span><span style="color:#569cd6;">= </span><span>|| { </span><span> </span><span style="color:#608b4e;">// TODO </span><span> }; </span><span> </span><span> p3.and_then(|p3| p3.next_table(page.p3_index())) </span><span> .and_then(|p2| p2.next_table(page.p2_index())) </span><span> .and_then(|p1| p1[page.p1_index()].pointed_frame()) </span><span> .or_else(huge_page) </span><span>} </span></code></pre> <p>We use an unsafe block to convert the raw <code>P4</code> pointer to a reference. Then we use the <a href="https://doc.rust-lang.org/nightly/core/option/enum.Option.html#method.and_then">Option::and_then</a> function to go through the four table levels. If some entry along the way is <code>None</code>, we check if the page is a huge page through the (unimplemented) <code>huge_page</code> closure.</p> <p>The <code>Page::p*_index</code> functions return the different table indexes. They look like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>p4_index(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> (self.number &gt;&gt; </span><span style="color:#b5cea8;">27</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777 </span><span>} </span><span style="color:#569cd6;">fn </span><span>p3_index(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> (self.number &gt;&gt; </span><span style="color:#b5cea8;">18</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777 </span><span>} </span><span style="color:#569cd6;">fn </span><span>p2_index(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> (self.number &gt;&gt; </span><span style="color:#b5cea8;">9</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777 </span><span>} </span><span style="color:#569cd6;">fn </span><span>p1_index(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">usize </span><span>{ </span><span> (self.number &gt;&gt; </span><span style="color:#b5cea8;">0</span><span>) </span><span style="color:#569cd6;">&amp; </span><span style="color:#b5cea8;">0o777 </span><span>} </span></code></pre> <h3 id="safety-1"><a class="zola-anchor" href="#safety-1" aria-label="Anchor link for: safety-1">🔗</a>Safety</h3> <p>We use an <code>unsafe</code> block to convert the raw <code>P4</code> pointer into a shared reference. It’s safe because we don’t create any <code>&amp;mut</code> references to the table right now and don’t switch the P4 table either. But as soon as we do something like that, we have to revisit this method.</p> <h3 id="huge-pages"><a class="zola-anchor" href="#huge-pages" aria-label="Anchor link for: huge-pages">🔗</a>Huge Pages</h3> <p>The <code>huge_page</code> closure calculates the corresponding frame if huge pages are used. Its content looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>p3.and_then(|p3| { </span><span> </span><span style="color:#569cd6;">let</span><span> p3_entry = </span><span style="color:#569cd6;">&amp;</span><span>p3[page.p3_index()]; </span><span> </span><span style="color:#608b4e;">// 1GiB page? </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(start_frame) = p3_entry.pointed_frame() { </span><span> </span><span style="color:#569cd6;">if</span><span> p3_entry.flags().contains(</span><span style="color:#b4cea8;">HUGE_PAGE</span><span>) { </span><span> </span><span style="color:#608b4e;">// address must be 1GiB aligned </span><span> assert!(start_frame.number % (</span><span style="color:#b4cea8;">ENTRY_COUNT </span><span>* </span><span style="color:#b4cea8;">ENTRY_COUNT</span><span>) == </span><span style="color:#b5cea8;">0</span><span>); </span><span> </span><span style="color:#569cd6;">return </span><span>Some(Frame { </span><span> number: start_frame.number + page.p2_index() * </span><span> </span><span style="color:#b4cea8;">ENTRY_COUNT </span><span>+ page.p1_index(), </span><span> }); </span><span> } </span><span> } </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(p2) = p3.next_table(page.p3_index()) { </span><span> </span><span style="color:#569cd6;">let</span><span> p2_entry = </span><span style="color:#569cd6;">&amp;</span><span>p2[page.p2_index()]; </span><span> </span><span style="color:#608b4e;">// 2MiB page? </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(start_frame) = p2_entry.pointed_frame() { </span><span> </span><span style="color:#569cd6;">if</span><span> p2_entry.flags().contains(</span><span style="color:#b4cea8;">HUGE_PAGE</span><span>) { </span><span> </span><span style="color:#608b4e;">// address must be 2MiB aligned </span><span> assert!(start_frame.number % </span><span style="color:#b4cea8;">ENTRY_COUNT </span><span>== </span><span style="color:#b5cea8;">0</span><span>); </span><span> </span><span style="color:#569cd6;">return </span><span>Some(Frame { </span><span> number: start_frame.number + page.p1_index() </span><span> }); </span><span> } </span><span> } </span><span> } </span><span> None </span><span> }) </span></code></pre> <p>This function is much longer and more complex than the <code>translate_page</code> function itself. To avoid this complexity in the future, we will only work with standard 4KiB pages from now on.</p> <h2 id="mapping-pages"><a class="zola-anchor" href="#mapping-pages" aria-label="Anchor link for: mapping-pages">🔗</a>Mapping Pages</h2> <p>Let’s add a function that modifies the page tables to map a <code>Page</code> to a <code>Frame</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub use </span><span>self::entry::*; </span><span style="color:#569cd6;">use </span><span>memory::FrameAllocator; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>map_to&lt;A&gt;(page: Page, frame: Frame, flags: EntryFlags, </span><span> allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> p4 = </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span style="color:#569cd6;">&amp;mut </span><span>*</span><span style="color:#b4cea8;">P4 </span><span>}; </span><span> </span><span style="color:#569cd6;">let mut</span><span> p3 = p4.next_table_create(page.p4_index(), allocator); </span><span> </span><span style="color:#569cd6;">let mut</span><span> p2 = p3.next_table_create(page.p3_index(), allocator); </span><span> </span><span style="color:#569cd6;">let mut</span><span> p1 = p2.next_table_create(page.p2_index(), allocator); </span><span> </span><span> assert!(p1[page.p1_index()].is_unused()); </span><span> p1[page.p1_index()].set(frame, flags </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">PRESENT</span><span>); </span><span>} </span></code></pre> <p>We add an re-export for all <code>entry</code> types since they are required to call the function. We assert that the page is unmapped and always set the present flag (since it wouldn’t make sense to map a page without setting it).</p> <p>The <code>Table::next_table_create</code> method doesn’t exist yet. It should return the next table if it exists, or create a new one. For the implementation we need the <code>FrameAllocator</code> from the <a href="https://os.phil-opp.com/allocating-frames/#a-memory-module">previous post</a> and the <code>Table::zero</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>memory::FrameAllocator; </span><span> </span><span style="color:#569cd6;">pub fn </span><span>next_table_create&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> index: </span><span style="color:#569cd6;">usize</span><span>, </span><span> allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> -&gt; </span><span style="color:#569cd6;">&amp;mut </span><span>Table&lt;</span><span style="color:#569cd6;">L::</span><span>NextLevel&gt; </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">if </span><span>self.next_table(index).is_none() { </span><span> assert!(</span><span style="color:#569cd6;">!</span><span>self.entries[index].flags().contains(</span><span style="color:#b4cea8;">HUGE_PAGE</span><span>), </span><span> </span><span style="color:#d69d85;">&quot;mapping code does not support huge pages&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> frame = allocator.allocate_frame().expect(</span><span style="color:#d69d85;">&quot;no frames available&quot;</span><span>); </span><span> self.entries[index].set(frame, </span><span style="color:#b4cea8;">PRESENT </span><span style="color:#569cd6;">| </span><span style="color:#b4cea8;">WRITABLE</span><span>); </span><span> self.next_table_mut(index).unwrap().zero(); </span><span> } </span><span> self.next_table_mut(index).unwrap() </span><span>} </span></code></pre> <p>We can use <code>unwrap()</code> here since the next table definitely exists.</p> <h3 id="safety-2"><a class="zola-anchor" href="#safety-2" aria-label="Anchor link for: safety-2">🔗</a>Safety</h3> <p>We used an <code>unsafe</code> block in <code>map_to</code> to convert the raw <code>P4</code> pointer to a <code>&amp;mut</code> reference. That’s bad. It’s now possible that the <code>&amp;mut</code> reference is not exclusive, which breaks Rust’s guarantees. It’s only a matter time before we run into a data race. For example, imagine that one thread maps an entry to <code>frame_A</code> and another thread (on the same core) tries to map the same entry to <code>frame_B</code>.</p> <p>The problem is that there’s no clear <em>owner</em> for the page tables. So let’s define page table ownership!</p> <h3 id="page-table-ownership"><a class="zola-anchor" href="#page-table-ownership" aria-label="Anchor link for: page-table-ownership">🔗</a>Page Table Ownership</h3> <p>We define the following:</p> <blockquote> <p>A page table owns all of its subtables.</p> </blockquote> <p>We already obey this rule: To get a reference to a table, we need to borrow it from its parent table through the <code>next_table</code> method. But who owns the P4 table?</p> <blockquote> <p>The recursively mapped P4 table is owned by a <code>ActivePageTable</code> struct.</p> </blockquote> <p>We just defined some random owner for the P4 table. But it will solve our problems. And it will also provide the interface to other modules.</p> <p>So let’s create the struct:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>self::table::{Table, Level4}; </span><span style="color:#569cd6;">use </span><span>core::ptr::Unique; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>ActivePageTable { </span><span> p4: Unique&lt;Table&lt;Level4&gt;&gt;, </span><span>} </span></code></pre> <p>We can’t store the <code>Table&lt;Level4&gt;</code> directly because it needs to be at a special memory location (like the <a href="https://os.phil-opp.com/printing-to-screen/#the-text-buffer">VGA text buffer</a>). We could use a raw pointer or <code>&amp;mut</code> instead of <a href="https://doc.rust-lang.org/1.10.0/core/ptr/struct.Unique.html">Unique</a>, but Unique indicates ownership better.</p> <p>Because the <code>ActivePageTable</code> owns the unique recursive mapped P4 table, there must be only one <code>ActivePageTable</code> instance. Thus we make the constructor function unsafe:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>ActivePageTable { </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>new() -&gt; ActivePageTable { </span><span> ActivePageTable { </span><span> p4: Unique::new_unchecked(table::</span><span style="color:#b4cea8;">P4</span><span>), </span><span> } </span><span> } </span><span>} </span></code></pre> <p>We add some methods to get P4 references:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>p4(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;</span><span>Table&lt;Level4&gt; { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ self.p4.as_ref() } </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>p4_mut(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;mut </span><span>Table&lt;Level4&gt; { </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ self.p4.as_mut() } </span><span>} </span></code></pre> <p>Since we will only create valid P4 pointers, the <code>unsafe</code> blocks are safe. However, we don’t make these functions public since they can be used to make page tables invalid. Only the higher level functions (such as <code>translate</code> or <code>map_to</code>) should be usable from other modules.</p> <p>Now we can make the <code>map_to</code> and <code>translate</code> functions safe by making them methods of <code>ActivePageTable</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>ActivePageTable { </span><span> </span><span style="color:#569cd6;">pub unsafe fn </span><span>new() -&gt; ActivePageTable {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>p4(</span><span style="color:#569cd6;">&amp;</span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;</span><span>Table&lt;Level4&gt; {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>p4_mut(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;mut </span><span>Table&lt;Level4&gt; {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>translate(</span><span style="color:#569cd6;">&amp;</span><span>self, virtual_address: VirtualAddress) </span><span> -&gt; Option&lt;PhysicalAddress&gt; </span><span> { </span><span> </span><span style="color:#569cd6;">... </span><span> self.translate_page(</span><span style="color:#569cd6;">...</span><span>).map(</span><span style="color:#569cd6;">...</span><span>) </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>translate_page(</span><span style="color:#569cd6;">&amp;</span><span>self, page: Page) -&gt; Option&lt;Frame&gt; { </span><span> </span><span style="color:#569cd6;">let</span><span> p3 = self.p4().next_table(</span><span style="color:#569cd6;">...</span><span>); </span><span> </span><span style="color:#569cd6;">... </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">pub fn </span><span>map_to&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> page: Page, </span><span> frame: Frame, </span><span> flags: EntryFlags, </span><span> allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span> { </span><span> </span><span style="color:#569cd6;">let mut</span><span> p3 = self.p4_mut().next_table_create(</span><span style="color:#569cd6;">...</span><span>); </span><span> </span><span style="color:#569cd6;">... </span><span> } </span><span>} </span></code></pre> <p>Now the <code>p4()</code> and <code>p4_mut()</code> methods should be the only methods containing an <code>unsafe</code> block in the <code>paging/mod.rs</code> file.</p> <h3 id="more-mapping-functions"><a class="zola-anchor" href="#more-mapping-functions" aria-label="Anchor link for: more-mapping-functions">🔗</a>More Mapping Functions</h3> <p>For convenience, we add a <code>map</code> method that just picks a free frame for us:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>map&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, page: Page, flags: EntryFlags, allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> frame = allocator.allocate_frame().expect(</span><span style="color:#d69d85;">&quot;out of memory&quot;</span><span>); </span><span> self.map_to(page, frame, flags, allocator) </span><span>} </span></code></pre> <p>We also add a <code>identity_map</code> function to make it easier to remap the kernel in the next post:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>identity_map&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, </span><span> frame: Frame, </span><span> flags: EntryFlags, </span><span> allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> page = Page::containing_address(frame.start_address()); </span><span> self.map_to(page, frame, flags, allocator) </span><span>} </span></code></pre> <h3 id="unmapping-pages"><a class="zola-anchor" href="#unmapping-pages" aria-label="Anchor link for: unmapping-pages">🔗</a>Unmapping Pages</h3> <p>To unmap a page, we set the corresponding P1 entry to unused:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">fn </span><span>unmap&lt;A&gt;(</span><span style="color:#569cd6;">&amp;mut </span><span>self, page: Page, allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> assert!(self.translate(page.start_address()).is_some()); </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> p1 = self.p4_mut() </span><span> .next_table_mut(page.p4_index()) </span><span> .and_then(|p3| p3.next_table_mut(page.p3_index())) </span><span> .and_then(|p2| p2.next_table_mut(page.p2_index())) </span><span> .expect(</span><span style="color:#d69d85;">&quot;mapping code does not support huge pages&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">let</span><span> frame = p1[page.p1_index()].pointed_frame().unwrap(); </span><span> p1[page.p1_index()].set_unused(); </span><span> </span><span style="color:#608b4e;">// TODO free p(1,2,3) table if empty </span><span> allocator.deallocate_frame(frame); </span><span>} </span></code></pre> <p>The assertion ensures that the page is mapped. Thus the corresponding P1 table and frame must exist for a standard 4KiB page. We set the entry to unused and free the associated frame in the supplied frame allocator.</p> <p>We can also free the P1, P2, or even P3 table when the last entry is freed. But checking the whole table on every <code>unmap</code> would be very expensive. So we leave the <code>TODO</code> in place until we find a good solution. I’m open for suggestions :).</p> <p><em>Spoiler</em>: There is an ugly bug in this function, which we will find in the next section.</p> <h2 id="testing-and-bugfixing"><a class="zola-anchor" href="#testing-and-bugfixing" aria-label="Anchor link for: testing-and-bugfixing">🔗</a>Testing and Bugfixing</h2> <p>To test it, we add a <code>test_paging</code> function in <code>memory/paging/mod.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>test_paging&lt;A&gt;(allocator: </span><span style="color:#569cd6;">&amp;mut</span><span> A) </span><span> </span><span style="color:#569cd6;">where</span><span> A: FrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> page_table = </span><span style="color:#569cd6;">unsafe </span><span>{ ActivePageTable::new() }; </span><span> </span><span> </span><span style="color:#608b4e;">// test it </span><span>} </span></code></pre> <p>We borrow the frame allocator since we will need it for the mapping functions. To be able to call that function from main, we need to re-export it in <code>memory/mod.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in memory/mod.rs </span><span style="color:#569cd6;">pub use </span><span>self::paging::test_paging; </span><span> </span><span style="color:#608b4e;">// lib.rs </span><span style="color:#569cd6;">let mut</span><span> frame_allocator = </span><span style="color:#569cd6;">...</span><span>; </span><span>memory::test_paging(</span><span style="color:#569cd6;">&amp;mut</span><span> frame_allocator); </span></code></pre> <h3 id="map-to"><a class="zola-anchor" href="#map-to" aria-label="Anchor link for: map-to">🔗</a>map_to</h3> <p>Let’s test the <code>map_to</code> function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> addr = </span><span style="color:#b5cea8;">42 </span><span>* </span><span style="color:#b5cea8;">512 </span><span>* </span><span style="color:#b5cea8;">512 </span><span>* </span><span style="color:#b5cea8;">4096</span><span>; </span><span style="color:#608b4e;">// 42th P3 entry </span><span style="color:#569cd6;">let</span><span> page = Page::containing_address(addr); </span><span style="color:#569cd6;">let</span><span> frame = allocator.allocate_frame().expect(</span><span style="color:#d69d85;">&quot;no more frames&quot;</span><span>); </span><span>println!(</span><span style="color:#d69d85;">&quot;None = </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">, map to </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> page_table.translate(addr), </span><span> frame); </span><span>page_table.map_to(page, frame, EntryFlags::empty(), allocator); </span><span>println!(</span><span style="color:#d69d85;">&quot;Some = </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, page_table.translate(addr)); </span><span>println!(</span><span style="color:#d69d85;">&quot;next free frame: </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, allocator.allocate_frame()); </span></code></pre> <p>We just map some random page to a free frame. To be able to borrow the page table as <code>&amp;mut</code>, we need to make it mutable.</p> <p>You should see output similar to this:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>None = None, map to Frame { number: 0 } </span><span>Some = Some(0) </span><span>next free frame: Some(Frame { number: 3 }) </span></code></pre> <p>It’s frame 0 because it’s the first frame returned by the frame allocator. Since we map the 42th P3 entry, the mapping code needs to create a P2 and a P1 table. So the next free frame returned by the allocator is frame 3.</p> <h3 id="unmap"><a class="zola-anchor" href="#unmap" aria-label="Anchor link for: unmap">🔗</a>unmap</h3> <p>To test the <code>unmap</code> function, we unmap the test page so that it translates to <code>None</code> again:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>page_table.unmap(Page::containing_address(addr), allocator); </span><span>println!(</span><span style="color:#d69d85;">&quot;None = </span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, page_table.translate(addr)); </span></code></pre> <p>It causes a panic since we call the unimplemented <code>deallocate_frame</code> method in <code>unmap</code>. If we comment this call out, it works without problems. But there is some bug in this function nevertheless.</p> <p>Let’s read something from the mapped page (of course before we unmap it again):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:#x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#569cd6;">unsafe </span><span>{ </span><span> *(Page::containing_address(addr).start_address() </span><span style="color:#569cd6;">as *const u64</span><span>) </span><span>}); </span></code></pre> <p>Since we don’t zero the mapped pages, the output is random. For me, it’s <code>0xf000ff53f000ff53</code>.</p> <p>If <code>unmap</code> worked correctly, reading it again after unmapping should cause a page fault. But it doesn’t. Instead, it just prints the same number again. When we remove the first read, we get the desired page fault (i.e. QEMU reboots again and again). So this seems to be some cache issue.</p> <p>An x86 processor has many different caches because always accessing the main memory would be very slow. Most of these caches are completely <em>transparent</em>. That means everything works exactly the same as without them, it’s just much faster. But there is one cache, that needs to be updated manually: the <em>translation lookaside buffer</em>.</p> <p>The translation lookaside buffer, or TLB, caches the translation of virtual to physical addresses. It’s filled automatically when a page is accessed. But it’s not updated transparently when the mapping of a page changes. This is the reason that we still can access the page even through we unmapped it in the page table.</p> <p>So to fix our <code>unmap</code> function, we need to remove the cached translation from the TLB. We can use the <a href="https://docs.rs/x86_64">x86_64</a> crate to do this easily. To add it, we append the following to our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#ff3333;">... </span><span style="color:#569cd6;">x86_64 </span><span>= </span><span style="color:#d69d85;">&quot;0.1.2&quot; </span></code></pre> <p>Now we can use it to fix <code>unmap</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">... </span><span> p1[page.p1_index()].set_unused(); </span><span> </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::instructions::tlb; </span><span> </span><span style="color:#569cd6;">use </span><span>x86_64::VirtualAddress; </span><span> tlb::flush(VirtualAddress(page.start_address())); </span><span> </span><span> </span><span style="color:#608b4e;">// TODO free p(1,2,3) table if empty </span><span> </span><span style="color:#608b4e;">//allocator.deallocate_frame(frame); </span><span>} </span></code></pre> <p>Now the desired page fault occurs even when we access the page before.</p> <h2 id="conclusion"><a class="zola-anchor" href="#conclusion" aria-label="Anchor link for: conclusion">🔗</a>Conclusion</h2> <p>This post has become pretty long. So let’s summarize what we’ve done:</p> <ul> <li>we created a paging module and modeled page tables plus entries</li> <li>we mapped the P4 page recursively and created <code>next_table</code> methods</li> <li>we used empty enums and associated types to make the <code>next_table</code> functions safe</li> <li>we wrote a function to translate virtual to physical addresses</li> <li>we created safe functions to map and unmap pages</li> <li>and we fixed stack overflow and TLB related bugs</li> </ul> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>In the <a href="https://os.phil-opp.com/remap-the-kernel/">next post</a> we will extend this module and add a function to modify inactive page tables. Through that function, we will create a new page table hierarchy that maps the kernel correctly using 4KiB pages. Then we will switch to the new table to get a safer kernel environment.</p> <p>Afterwards, we will use this paging module to build a heap allocator. This will allow us to use allocation and collection types such as <code>Box</code> and <code>Vec</code>.</p> <p><small>Image sources: <sup class="footnote-reference"><a href="#virtual_physical_translation_source">2</a></sup></small></p> <h2 id="footnotes"><a class="zola-anchor" href="#footnotes" aria-label="Anchor link for: footnotes">🔗</a>Footnotes</h2> <div class="footnote-definition" id="fn-invalid-address"><sup class="footnote-definition-label">1</sup> <p>If the <code>XXX</code> part of the address is smaller than <code>0o400</code>, it’s binary representation doesn’t start with <code>1</code>. But the sign extension bits, which should be a copy of that bit, are <code>1</code> instead of <code>0</code>. Thus the address is not valid.</p> </div> <div class="footnote-definition" id="virtual_physical_translation_source"><sup class="footnote-definition-label">2</sup> <p>Image sources: Modified versions of an image from <a href="https://commons.wikimedia.org/wiki/File:X86_Paging_64bit.svg">Wikipedia</a>. The modified files are licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.</p> </div> Allocating Frames Sun, 15 Nov 2015 00:00:00 +0000 https://os.phil-opp.com/allocating-frames/ https://os.phil-opp.com/allocating-frames/ <p>In this post we create an allocator that provides free physical frames for a future paging module. To get the required information about available and used memory we use the Multiboot information structure. Additionally, we improve the <code>panic</code> handler to print the corresponding message and source line.</p> <span id="continue-reading"></span> <p>The full source code is available on <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_5">GitHub</a>. Feel free to open issues there if you have any problems or improvements. You can also leave a comment at the bottom.</p> <h2 id="preparation"><a class="zola-anchor" href="#preparation" aria-label="Anchor link for: preparation">🔗</a>Preparation</h2> <p>We still have a really tiny stack of 64 bytes, which won’t suffice for this post. So we increase it to 16kB (four pages) in <code>boot.asm</code>:</p> <pre data-lang="asm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-asm "><code class="language-asm" data-lang="asm"><span>section .bss </span><span>... </span><span>stack_bottom: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>* </span><span style="color:#b4cea8;">4 </span><span>stack_top: </span></code></pre> <h2 id="the-multiboot-information-structure"><a class="zola-anchor" href="#the-multiboot-information-structure" aria-label="Anchor link for: the-multiboot-information-structure">🔗</a>The Multiboot Information Structure</h2> <p>When a Multiboot compliant bootloader loads a kernel, it passes a pointer to a boot information structure in the <code>ebx</code> register. We can use it to get information about available memory and loaded kernel sections.</p> <p>First, we need to pass this pointer to our kernel as an argument to <code>rust_main</code>. To find out how arguments are passed to functions, we can look at the <a href="https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI">calling convention of Linux</a>:</p> <blockquote> <p>The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9</p> </blockquote> <p>So to pass the pointer to our kernel, we need to move it to <code>rdi</code> before calling the kernel. Since we’re not using the <code>rdi</code>/<code>edi</code> register in our bootstrap code, we can simply set the <code>edi</code> register right after booting (in <code>boot.asm</code>):</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>start: </span><span> </span><span style="color:#569cd6;">mov </span><span>esp, stack_top </span><span> </span><span style="color:#569cd6;">mov </span><span>edi, ebx</span><span style="color:#608b4e;"> ; Move Multiboot info pointer to edi </span></code></pre> <p>Now we can add the argument to our <code>rust_main</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub extern fn </span><span>rust_main(multiboot_information_address: </span><span style="color:#569cd6;">usize</span><span>) { </span><span style="color:#569cd6;">... </span><span>} </span></code></pre> <p>Instead of writing an own Multiboot module, we use the <a href="https://docs.rs/multiboot2">multiboot2</a> crate. It gives us some basic information about mapped kernel sections and available memory. I just wrote it for this blog post since I could not find any other Multiboot 2 crate. It’s still incomplete, but it does its job.</p> <p>So let’s add a dependency on the git repository:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#ff3333;">... </span><span style="color:#569cd6;">multiboot2 </span><span>= </span><span style="color:#d69d85;">&quot;0.1.0&quot; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">extern crate</span><span> multiboot2; </span></code></pre> <p>Now we can use it to print available memory areas.</p> <h3 id="available-memory"><a class="zola-anchor" href="#available-memory" aria-label="Anchor link for: available-memory">🔗</a>Available Memory</h3> <p>The boot information structure consists of various <em>tags</em>. See section 3.4 of the Multiboot specification (<a href="https://nongnu.askapache.com/grub/phcoder/multiboot.pdf">PDF</a>) for a complete list. The <em>memory map</em> tag contains a list of all available RAM areas. Special areas such as the VGA text buffer at <code>0xb8000</code> are not available. Note that some of the available memory is already used by our kernel and by the multiboot information structure itself.</p> <p>To print all available memory areas, we can use the <code>multiboot2</code> crate in our <code>rust_main</code> as follows:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> boot_info = </span><span style="color:#569cd6;">unsafe</span><span>{ multiboot2::load(multiboot_information_address) }; </span><span style="color:#569cd6;">let</span><span> memory_map_tag = boot_info.memory_map_tag() </span><span> .expect(</span><span style="color:#d69d85;">&quot;Memory map tag required&quot;</span><span>); </span><span> </span><span>println!(</span><span style="color:#d69d85;">&quot;memory areas:&quot;</span><span>); </span><span style="color:#569cd6;">for</span><span> area </span><span style="color:#569cd6;">in</span><span> memory_map_tag.memory_areas() { </span><span> println!(</span><span style="color:#d69d85;">&quot; start: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">, length: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> area.base_addr, area.length); </span><span>} </span></code></pre> <p>The <code>load</code> function is <code>unsafe</code> because it relies on a valid address. Since the memory tag is not required by the Multiboot specification, the <code>memory_map_tag()</code> function returns an <code>Option</code>. The <code>memory_areas()</code> function returns the desired memory area iterator.</p> <p>The output looks like this:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>Hello World! </span><span>memory areas: </span><span> start: 0x0, length: 0x9fc00 </span><span> start: 0x100000, length: 0x7ee0000 </span></code></pre> <p>So we have one area from <code>0x0</code> to <code>0x9fc00</code>, which is a bit below the 1MiB mark. The second, bigger area starts at 1MiB and contains the rest of available memory. The area from <code>0x9fc00</code> to 1MiB is not available since it contains for example the VGA text buffer at <code>0xb8000</code>. This is the reason for putting our kernel at 1MiB and not somewhere below.</p> <p>If you give QEMU more than 4GiB of memory by passing <code>-m 5G</code>, you get another unusable area below the 4GiB mark. This memory is normally mapped to some hardware devices. See the <a href="https://wiki.osdev.org/Memory_Map_(x86)">OSDev Wiki</a> for more information.</p> <h3 id="handling-panics"><a class="zola-anchor" href="#handling-panics" aria-label="Anchor link for: handling-panics">🔗</a>Handling Panics</h3> <p>We used <code>expect</code> in the code above, which will panic if there is no memory map tag. But our current panic handler just loops without printing any error message. Of course we could replace <code>expect</code> by a <code>match</code>, but we should fix the panic handler nonetheless:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[lang </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;panic_fmt&quot;</span><span>] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>panic_fmt() -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;PANIC&quot;</span><span>); </span><span> </span><span style="color:#569cd6;">loop</span><span>{} </span><span>} </span></code></pre> <p>Now we get a <code>PANIC</code> message. But we can do even better. The <code>panic_fmt</code> function has actually some arguments:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[lang </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;panic_fmt&quot;</span><span>] </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>panic_fmt(fmt: core::fmt::Arguments, file: </span><span style="color:#569cd6;">&amp;&#39;static str</span><span>, </span><span> line: </span><span style="color:#569cd6;">u32</span><span>) -&gt; </span><span style="color:#569cd6;">! </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n\n</span><span style="color:#d69d85;">PANIC in </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;"> at line </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">:&quot;</span><span>, file, line); </span><span> println!(</span><span style="color:#d69d85;">&quot; </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, fmt); </span><span> </span><span style="color:#569cd6;">loop</span><span>{} </span><span>} </span></code></pre> <p>Be careful with these arguments as the compiler does not check the function signature for <code>lang_items</code>.</p> <p>Now we get the panic message and the causing source line. You can try it by inserting a <code>panic</code> somewhere.</p> <h3 id="kernel-elf-sections"><a class="zola-anchor" href="#kernel-elf-sections" aria-label="Anchor link for: kernel-elf-sections">🔗</a>Kernel ELF Sections</h3> <p>To read and print the sections of our kernel ELF file, we can use the <em>Elf-sections</em> tag:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> elf_sections_tag = boot_info.elf_sections_tag() </span><span> .expect(</span><span style="color:#d69d85;">&quot;Elf-sections tag required&quot;</span><span>); </span><span> </span><span>println!(</span><span style="color:#d69d85;">&quot;kernel sections:&quot;</span><span>); </span><span style="color:#569cd6;">for</span><span> section </span><span style="color:#569cd6;">in</span><span> elf_sections_tag.sections() { </span><span> println!(</span><span style="color:#d69d85;">&quot; addr: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">, size: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">, flags: 0x</span><span style="color:#b4cea8;">{:x}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span> section.addr, section.size, section.flags); </span><span>} </span></code></pre> <p>This should print out the start address and size of all kernel sections. If the section is writable, the <code>0x1</code> bit is set in <code>flags</code>. The <code>0x4</code> bit marks an executable section and the <code>0x2</code> bit indicates that the section was loaded in memory. For example, the <code>.text</code> section is executable but not writable and the <code>.data</code> section just the opposite.</p> <p>But when we execute it, tons of really small sections are printed. We can use the <code>objdump -h build/kernel-x86_64.bin</code> command to list the sections with name. There seem to be over 200 sections and many of them start with <code>.text.*</code> or <code>.data.rel.ro.local.*</code>. This is because the Rust compiler puts e.g. each function in its own <code>.text</code> subsection. That way, unused functions are removed when the linker omits unused sections.</p> <p>To merge these subsections, we need to update our linker script:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>ENTRY(start) </span><span> </span><span>SECTIONS { </span><span> . = 1M; </span><span> </span><span> .boot : </span><span> { </span><span> KEEP(*(.multiboot_header)) </span><span> } </span><span> </span><span> .text : </span><span> { </span><span> *(.text .text.*) </span><span> } </span><span> </span><span> .rodata : { </span><span> *(.rodata .rodata.*) </span><span> } </span><span> </span><span> .data.rel.ro : { </span><span> *(.data.rel.ro.local*) *(.data.rel.ro .data.rel.ro.*) </span><span> } </span><span>} </span></code></pre> <p>These lines are taken from the default linker script of <code>ld</code>, which can be obtained through <code>ld ‑verbose</code>. The <code>.text</code> <em>output</em> section contains now all <code>.text.*</code> <em>input</em> sections of the static library (and the same applies for the <code>.rodata</code> and <code>.data.rel.ro</code> sections).</p> <p>Now there are only 12 sections left and we get a much more useful output:</p> <p><img src="https://os.phil-opp.com/allocating-frames/qemu-memory-areas-and-kernel-sections.png" alt="qemu output" /></p> <p>If you like, you can compare this output to the <code>objdump -h build/kernel-x86_64.bin</code> output. You will see that the start addresses and sizes match exactly for each section. The sections with flags <code>0x0</code> are mostly debug sections, so they don’t need to be loaded. And the last few sections of the QEMU output aren’t in the <code>objdump</code> output because they are special sections such as string tables.</p> <h3 id="start-and-end-of-kernel"><a class="zola-anchor" href="#start-and-end-of-kernel" aria-label="Anchor link for: start-and-end-of-kernel">🔗</a>Start and End of Kernel</h3> <p>We can now use the ELF section tag to calculate the start and end address of our loaded kernel:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> kernel_start = elf_sections_tag.sections().map(|s| s.addr) </span><span> .min().unwrap(); </span><span style="color:#569cd6;">let</span><span> kernel_end = elf_sections_tag.sections().map(|s| s.addr + s.size) </span><span> .max().unwrap(); </span></code></pre> <p>The other used memory area is the Multiboot Information structure:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let</span><span> multiboot_start = multiboot_information_address; </span><span style="color:#569cd6;">let</span><span> multiboot_end = multiboot_start + (boot_info.total_size </span><span style="color:#569cd6;">as usize</span><span>); </span></code></pre> <p>Printing these numbers gives us:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>kernel_start: 0x100000, kernel_end: 0x11a168 </span><span>multiboot_start: 0x11d400, multiboot_end: 0x11d9c8 </span></code></pre> <p>So the kernel starts at 1MiB (like expected) and is about 105 KiB in size. The multiboot information structure was placed at <code>0x11d400</code> by GRUB and needs 1480 bytes. Of course your numbers could be a bit different due to different versions of Rust or GRUB (or some differences in the source code).</p> <h2 id="a-frame-allocator"><a class="zola-anchor" href="#a-frame-allocator" aria-label="Anchor link for: a-frame-allocator">🔗</a>A frame allocator</h2> <p>When using paging, the physical memory is split into equally sized chunks (normally 4096 bytes) Such a chunk is called “physical page” or “frame”. These frames can be mapped to any virtual page through page tables. For more information about paging take a peek at the <a href="https://os.phil-opp.com/page-tables/">next post</a>.</p> <p>We will need a free frame in many cases. For example when want to increase the size of our future kernel heap. Or when we create a new page table. Or when we add a new kernel thread and thus need to allocate a new stack. So we need some kind of allocator that keeps track of physical frames and gives us a free one when needed.</p> <p>There are various ways to write such a frame allocator:</p> <p>We could create some kind of linked list from the free frames. For example, each frame could begin with a pointer to the next free frame. Since the frames are free, this would not overwrite any data. Our allocator would just save the head of the list and could easily allocate and deallocate frames by updating pointers. Unfortunately, this approach has a problem: It requires reading and writing these free frames. So we would need to map all physical frames to some virtual address, at least temporary. Another disadvantage is that we need to create this linked list at startup. That implies that we need to set over one million pointers at startup if the machine has 4GiB of RAM.</p> <p>Another approach is to create some kind of data structure such as a <a href="https://wiki.osdev.org/Page_Frame_Allocation#Physical_Memory_Allocators">bitmap or a stack</a> to manage free frames. We could place it in the already identity mapped area right behind the kernel or multiboot structure. That way we would not need to (temporary) map each free frame. But it has the same problem of the slow initial creating/filling. In fact, we will use this approach in a future post to manage frames that are freed again. But for the initial management of free frames, we use a different method.</p> <p>In the following, we will use Multiboot’s memory map directly. The idea is to maintain a simple counter that starts at frame 0 and is increased constantly. If the current frame is available (part of an available area in the memory map) and not used by the kernel or the multiboot structure (we know their start and end addresses), we know that it’s free and return it. Else, we increase the counter to the next possibly free frame. That way, we don’t need to create a data structure when booting and the physical frames can remain unmapped. The only problem is that we cannot reasonably free frames again, but we will solve that problem in a future post (by adding an intermediate frame stack that saves freed frames).</p> <!--- TODO link future post --> <p>So let’s start implementing our memory map based frame allocator.</p> <h3 id="a-memory-module"><a class="zola-anchor" href="#a-memory-module" aria-label="Anchor link for: a-memory-module">🔗</a>A Memory Module</h3> <p>First we create a memory module with a <code>Frame</code> type (<code>src/memory/mod.rs</code>):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[derive(Debug, PartialEq, Eq, PartialOrd, Ord)] </span><span style="color:#569cd6;">pub struct </span><span>Frame { </span><span> number: </span><span style="color:#569cd6;">usize</span><span>, </span><span>} </span></code></pre> <p>(Don’t forget to add the <code>mod memory</code> line to <code>src/lib.rs</code>.) Instead of e.g. the start address, we just store the frame number. We use <code>usize</code> here since the number of frames depends on the memory size. The long <code>derive</code> line makes frames printable and comparable.</p> <p>To make it easy to get the corresponding frame for a physical address, we add a <code>containing_address</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub const </span><span style="color:#b4cea8;">PAGE_SIZE</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">4096</span><span>; </span><span> </span><span style="color:#569cd6;">impl </span><span>Frame { </span><span> </span><span style="color:#569cd6;">fn </span><span>containing_address(address: </span><span style="color:#569cd6;">usize</span><span>) -&gt; Frame { </span><span> Frame{ number: address / </span><span style="color:#b4cea8;">PAGE_SIZE </span><span>} </span><span> } </span><span>} </span></code></pre> <p>We also add a <code>FrameAllocator</code> trait:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub trait </span><span>FrameAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;Frame&gt;; </span><span> </span><span style="color:#569cd6;">fn </span><span>deallocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self, frame: Frame); </span><span>} </span></code></pre> <p>This allows us to create another, more advanced frame allocator in the future.</p> <h3 id="the-allocator"><a class="zola-anchor" href="#the-allocator" aria-label="Anchor link for: the-allocator">🔗</a>The Allocator</h3> <p>Now we can put everything together and create the actual frame allocator. Therefor we create a <code>src/memory/area_frame_allocator.rs</code> submodule. The allocator struct looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>memory::{Frame, FrameAllocator}; </span><span style="color:#569cd6;">use </span><span>multiboot2::{MemoryAreaIter, MemoryArea}; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>AreaFrameAllocator { </span><span> next_free_frame: Frame, </span><span> current_area: Option&lt;</span><span style="color:#569cd6;">&amp;&#39;static</span><span> MemoryArea&gt;, </span><span> areas: MemoryAreaIter, </span><span> kernel_start: Frame, </span><span> kernel_end: Frame, </span><span> multiboot_start: Frame, </span><span> multiboot_end: Frame, </span><span>} </span></code></pre> <p>The <code>next_free_frame</code> field is a simple counter that is increased every time we return a frame. It’s initialized to <code>0</code> and every frame below it counts as used. The <code>current_area</code> field holds the memory area that contains <code>next_free_frame</code>. If <code>next_free_frame</code> leaves this area, we will look for the next one in <code>areas</code>. When there are no areas left, all frames are used and <code>current_area</code> becomes <code>None</code>. The <code>{kernel, multiboot}_{start, end}</code> fields are used to avoid returning already used fields.</p> <p>To implement the <code>FrameAllocator</code> trait, we need to implement the allocation and deallocation methods:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>FrameAllocator </span><span style="color:#569cd6;">for </span><span>AreaFrameAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;Frame&gt; { </span><span> </span><span style="color:#608b4e;">// TODO (see below) </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>deallocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self, frame: Frame) { </span><span> </span><span style="color:#608b4e;">// TODO (see below) </span><span> } </span><span>} </span></code></pre> <p>The <code>allocate_frame</code> method looks like this:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `allocate_frame` in `impl FrameAllocator for AreaFrameAllocator` </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(area) = self.current_area { </span><span> </span><span style="color:#608b4e;">// &quot;Clone&quot; the frame to return it if it&#39;s free. Frame doesn&#39;t </span><span> </span><span style="color:#608b4e;">// implement Clone, but we can construct an identical frame. </span><span> </span><span style="color:#569cd6;">let</span><span> frame = Frame{ number: self.next_free_frame.number }; </span><span> </span><span> </span><span style="color:#608b4e;">// the last frame of the current area </span><span> </span><span style="color:#569cd6;">let</span><span> current_area_last_frame = { </span><span> </span><span style="color:#569cd6;">let</span><span> address = area.base_addr + area.length - </span><span style="color:#b5cea8;">1</span><span>; </span><span> Frame::containing_address(address </span><span style="color:#569cd6;">as usize</span><span>) </span><span> }; </span><span> </span><span> </span><span style="color:#569cd6;">if</span><span> frame &gt; current_area_last_frame { </span><span> </span><span style="color:#608b4e;">// all frames of current area are used, switch to next area </span><span> self.choose_next_area(); </span><span> } </span><span style="color:#569cd6;">else if</span><span> frame &gt;= self.kernel_start </span><span style="color:#569cd6;">&amp;&amp;</span><span> frame &lt;= self.kernel_end { </span><span> </span><span style="color:#608b4e;">// `frame` is used by the kernel </span><span> self.next_free_frame = Frame { </span><span> number: self.kernel_end.number + </span><span style="color:#b5cea8;">1 </span><span> }; </span><span> } </span><span style="color:#569cd6;">else if</span><span> frame &gt;= self.multiboot_start </span><span style="color:#569cd6;">&amp;&amp;</span><span> frame &lt;= self.multiboot_end { </span><span> </span><span style="color:#608b4e;">// `frame` is used by the multiboot information structure </span><span> self.next_free_frame = Frame { </span><span> number: self.multiboot_end.number + </span><span style="color:#b5cea8;">1 </span><span> }; </span><span> } </span><span style="color:#569cd6;">else </span><span>{ </span><span> </span><span style="color:#608b4e;">// frame is unused, increment `next_free_frame` and return it </span><span> self.next_free_frame.number += </span><span style="color:#b5cea8;">1</span><span>; </span><span> </span><span style="color:#569cd6;">return </span><span>Some(frame); </span><span> } </span><span> </span><span style="color:#608b4e;">// `frame` was not valid, try it again with the updated `next_free_frame` </span><span> self.allocate_frame() </span><span>} </span><span style="color:#569cd6;">else </span><span>{ </span><span> None </span><span style="color:#608b4e;">// no free frames left </span><span>} </span></code></pre> <p>The <code>choose_next_area</code> method isn’t part of the trait and thus goes into a new <code>impl AreaFrameAllocator</code> block:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `impl AreaFrameAllocator` </span><span> </span><span style="color:#569cd6;">fn </span><span>choose_next_area(</span><span style="color:#569cd6;">&amp;mut </span><span>self) { </span><span> self.current_area = self.areas.clone().filter(|area| { </span><span> </span><span style="color:#569cd6;">let</span><span> address = area.base_addr + area.length - </span><span style="color:#b5cea8;">1</span><span>; </span><span> Frame::containing_address(address </span><span style="color:#569cd6;">as usize</span><span>) &gt;= self.next_free_frame </span><span> }).min_by_key(|area| area.base_addr); </span><span> </span><span> </span><span style="color:#569cd6;">if let </span><span>Some(area) = self.current_area { </span><span> </span><span style="color:#569cd6;">let</span><span> start_frame = Frame::containing_address(area.base_addr </span><span style="color:#569cd6;">as usize</span><span>); </span><span> </span><span style="color:#569cd6;">if </span><span>self.next_free_frame &lt; start_frame { </span><span> self.next_free_frame = start_frame; </span><span> } </span><span> } </span><span>} </span></code></pre> <p>This function chooses the area with the minimal base address that still has free frames, i.e. <code>next_free_frame</code> is smaller than its last frame. Note that we need to clone the iterator because the <a href="https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.min_by_key">min_by_key</a> function consumes it. If there are no areas with free frames left, <code>min_by_key</code> automatically returns the desired <code>None</code>.</p> <p>If the <code>next_free_frame</code> is below the new <code>current_area</code>, it needs to be updated to the area’s start frame. Else, the <code>allocate_frame</code> call could return an unavailable frame.</p> <p>We don’t have a data structure to store free frames, so we can’t implement <code>deallocate_frame</code> reasonably. Thus we use the <code>unimplemented</code> macro, which just panics when the method is called:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>FrameAllocator </span><span style="color:#569cd6;">for </span><span>AreaFrameAllocator { </span><span> </span><span style="color:#569cd6;">fn </span><span>allocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; Option&lt;Frame&gt; { </span><span> </span><span style="color:#608b4e;">// described above </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>deallocate_frame(</span><span style="color:#569cd6;">&amp;mut </span><span>self, _frame: Frame) { </span><span> unimplemented!() </span><span> } </span><span>} </span></code></pre> <p>Now we only need a constructor function to make the allocator usable:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>new(kernel_start: </span><span style="color:#569cd6;">usize</span><span>, kernel_end: </span><span style="color:#569cd6;">usize</span><span>, </span><span> multiboot_start: </span><span style="color:#569cd6;">usize</span><span>, multiboot_end: </span><span style="color:#569cd6;">usize</span><span>, </span><span> memory_areas: MemoryAreaIter) -&gt; AreaFrameAllocator </span><span>{ </span><span> </span><span style="color:#569cd6;">let mut</span><span> allocator = AreaFrameAllocator { </span><span> next_free_frame: Frame::containing_address(</span><span style="color:#b5cea8;">0</span><span>), </span><span> current_area: None, </span><span> areas: memory_areas, </span><span> kernel_start: Frame::containing_address(kernel_start), </span><span> kernel_end: Frame::containing_address(kernel_end), </span><span> multiboot_start: Frame::containing_address(multiboot_start), </span><span> multiboot_end: Frame::containing_address(multiboot_end), </span><span> }; </span><span> allocator.choose_next_area(); </span><span> allocator </span><span>} </span></code></pre> <p>Note that we call <code>choose_next_area</code> manually here because <code>allocate_frame</code> returns <code>None</code> as soon as <code>current_area</code> is <code>None</code>. So by calling <code>choose_next_area</code> we initialize it to the area with the minimal base address.</p> <h3 id="testing-it"><a class="zola-anchor" href="#testing-it" aria-label="Anchor link for: testing-it">🔗</a>Testing it</h3> <p>In order to test it in main, we need to <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/crates-and-modules.html#re-exporting-with-pub-use">re-export</a> the <code>AreaFrameAllocator</code> in the <code>memory</code> module. Then we can create a new allocator:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">let mut</span><span> frame_allocator = memory::AreaFrameAllocator::new( </span><span> kernel_start </span><span style="color:#569cd6;">as usize</span><span>, kernel_end </span><span style="color:#569cd6;">as usize</span><span>, multiboot_start, </span><span> multiboot_end, memory_map_tag.memory_areas()); </span></code></pre> <p>Now we can test it by adding some frame allocations:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{:?}</span><span style="color:#d69d85;">&quot;</span><span>, frame_allocator.allocate_frame()); </span></code></pre> <p>You will see that the frame number starts at <code>0</code> and increases steadily, but the kernel and Multiboot frames are left out (you need to allocate many frames to see this since the kernel starts at frame 256).</p> <p>The following <code>for</code> loop allocates all frames and prints out the total number of allocated frames:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">for</span><span> i </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">.. </span><span>{ </span><span> </span><span style="color:#569cd6;">if let </span><span>None = frame_allocator.allocate_frame() { </span><span> println!(</span><span style="color:#d69d85;">&quot;allocated </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;"> frames&quot;</span><span>, i); </span><span> </span><span style="color:#569cd6;">break</span><span>; </span><span> } </span><span>} </span></code></pre> <p>You can try different amounts of memory by passing e.g. <code>-m 500M</code> to QEMU. To compare these numbers, <a href="https://www.wolframalpha.com/input/?i=%2832605+*+4096%29+bytes+in+MiB">WolframAlpha</a> can be very helpful.</p> <h2 id="conclusion"><a class="zola-anchor" href="#conclusion" aria-label="Anchor link for: conclusion">🔗</a>Conclusion</h2> <p>Now we have a working frame allocator. It is a bit rudimentary and cannot free frames, but it also is very fast since it reuses the Multiboot memory map and does not need any costly initialization. A future post will build upon this allocator and add a stack-like data structure for freed frames.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>The <a href="https://os.phil-opp.com/page-tables/">next post</a> will be about paging again. We will use the frame allocator to create a safe module that allows us to switch page tables and map pages. Then we will use this module and the information from the Elf-sections tag to remap the kernel correctly.</p> <h2 id="recommended-posts"><a class="zola-anchor" href="#recommended-posts" aria-label="Anchor link for: recommended-posts">🔗</a>Recommended Posts</h2> <p>Eric Kidd started the <a href="http://www.randomhacks.net/bare-metal-rust/">Bare Metal Rust</a> series last week. Like this post, it builds upon the code from <a href="https://os.phil-opp.com/printing-to-screen/">Printing to Screen</a>, but tries to support keyboard input instead of wrestling through memory management details.</p> Printing to Screen Fri, 23 Oct 2015 00:00:00 +0000 https://os.phil-opp.com/printing-to-screen/ https://os.phil-opp.com/printing-to-screen/ <p>In the <a href="https://os.phil-opp.com/set-up-rust/">previous post</a> we switched from assembly to <a href="https://www.rust-lang.org/">Rust</a>, a systems programming language that provides great safety. But so far we are using unsafe features like <a href="https://doc.rust-lang.org/book/raw-pointers.html">raw pointers</a> whenever we want to print to screen. In this post we will create a Rust module that provides a safe and easy-to-use interface for the VGA text buffer. It will support Rust’s <a href="https://doc.rust-lang.org/std/fmt/#related-macros">formatting macros</a>, too.</p> <span id="continue-reading"></span> <p>This post uses recent unstable features, so you need an up-to-date nighly compiler. If you have any questions, problems, or suggestions please <a href="https://github.com/phil-opp/blog_os/issues">file an issue</a> or create a comment at the bottom. The code from this post is also available on <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_4">GitHub</a>.</p> <h2 id="the-vga-text-buffer"><a class="zola-anchor" href="#the-vga-text-buffer" aria-label="Anchor link for: the-vga-text-buffer">🔗</a>The VGA Text Buffer</h2> <p>The text buffer starts at physical address <code>0xb8000</code> and contains the characters displayed on screen. It has 25 rows and 80 columns. Each screen character has the following format:</p> <table><thead><tr><th>Bit(s)</th><th>Value</th></tr></thead><tbody> <tr><td>0-7</td><td>ASCII code point</td></tr> <tr><td>8-11</td><td>Foreground color</td></tr> <tr><td>12-14</td><td>Background color</td></tr> <tr><td>15</td><td>Blink</td></tr> </tbody></table> <p>The following colors are available:</p> <table><thead><tr><th>Number</th><th>Color</th><th>Number + Bright Bit</th><th>Bright Color</th></tr></thead><tbody> <tr><td>0x0</td><td>Black</td><td>0x8</td><td>Dark Gray</td></tr> <tr><td>0x1</td><td>Blue</td><td>0x9</td><td>Light Blue</td></tr> <tr><td>0x2</td><td>Green</td><td>0xa</td><td>Light Green</td></tr> <tr><td>0x3</td><td>Cyan</td><td>0xb</td><td>Light Cyan</td></tr> <tr><td>0x4</td><td>Red</td><td>0xc</td><td>Light Red</td></tr> <tr><td>0x5</td><td>Magenta</td><td>0xd</td><td>Pink</td></tr> <tr><td>0x6</td><td>Brown</td><td>0xe</td><td>Yellow</td></tr> <tr><td>0x7</td><td>Light Gray</td><td>0xf</td><td>White</td></tr> </tbody></table> <p>Bit 4 is the <em>bright bit</em>, which turns for example blue into light blue. It is unavailable in background color as the bit is used to control if the text should blink. If you want to use a light background color (e.g. white) you have to disable blinking through a <a href="http://www.ctyme.com/intr/rb-0117.htm">BIOS function</a>.</p> <h2 id="a-basic-rust-module"><a class="zola-anchor" href="#a-basic-rust-module" aria-label="Anchor link for: a-basic-rust-module">🔗</a>A basic Rust Module</h2> <p>Now that we know how the VGA buffer works, we can create a Rust module to handle printing:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">mod </span><span>vga_buffer; </span></code></pre> <p>The content of this module can live either in <code>src/vga_buffer.rs</code> or <code>src/vga_buffer/mod.rs</code>. The latter supports submodules while the former does not. But our module does not need any submodules so we create it as <code>src/vga_buffer.rs</code>.</p> <p>All of the code below goes into our new module (unless specified otherwise).</p> <h3 id="colors"><a class="zola-anchor" href="#colors" aria-label="Anchor link for: colors">🔗</a>Colors</h3> <p>First, we represent the different colors using an enum:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[allow(dead_code)] </span><span>#[repr(u8)] </span><span style="color:#569cd6;">pub enum </span><span>Color { </span><span> Black = </span><span style="color:#b5cea8;">0</span><span>, </span><span> Blue = </span><span style="color:#b5cea8;">1</span><span>, </span><span> Green = </span><span style="color:#b5cea8;">2</span><span>, </span><span> Cyan = </span><span style="color:#b5cea8;">3</span><span>, </span><span> Red = </span><span style="color:#b5cea8;">4</span><span>, </span><span> Magenta = </span><span style="color:#b5cea8;">5</span><span>, </span><span> Brown = </span><span style="color:#b5cea8;">6</span><span>, </span><span> LightGray = </span><span style="color:#b5cea8;">7</span><span>, </span><span> DarkGray = </span><span style="color:#b5cea8;">8</span><span>, </span><span> LightBlue = </span><span style="color:#b5cea8;">9</span><span>, </span><span> LightGreen = </span><span style="color:#b5cea8;">10</span><span>, </span><span> LightCyan = </span><span style="color:#b5cea8;">11</span><span>, </span><span> LightRed = </span><span style="color:#b5cea8;">12</span><span>, </span><span> Pink = </span><span style="color:#b5cea8;">13</span><span>, </span><span> Yellow = </span><span style="color:#b5cea8;">14</span><span>, </span><span> White = </span><span style="color:#b5cea8;">15</span><span>, </span><span>} </span></code></pre> <p>We use a <a href="https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html">C-like enum</a> here to explicitly specify the number for each color. Because of the <code>repr(u8)</code> attribute each enum variant is stored as an <code>u8</code>. Actually 4 bits would be sufficient, but Rust doesn’t have an <code>u4</code> type.</p> <p>Normally the compiler would issue a warning for each unused variant. By using the <code>#[allow(dead_code)]</code> attribute we disable these warnings for the <code>Color</code> enum.</p> <p>To represent a full color code that specifies foreground and background color, we create a newtype on top of <code>u8</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">struct </span><span>ColorCode(</span><span style="color:#569cd6;">u8</span><span>); </span><span> </span><span style="color:#569cd6;">impl </span><span>ColorCode { </span><span> </span><span style="color:#569cd6;">const fn </span><span>new(foreground: Color, background: Color) -&gt; ColorCode { </span><span> ColorCode((background </span><span style="color:#569cd6;">as u8</span><span>) &lt;&lt; </span><span style="color:#b5cea8;">4 </span><span style="color:#569cd6;">| </span><span>(foreground </span><span style="color:#569cd6;">as u8</span><span>)) </span><span> } </span><span>} </span></code></pre> <p>The <code>ColorCode</code> contains the full color byte, containing foreground and background color. Blinking is enabled implicitly by using a bright background color (soon we will disable blinking anyway). The <code>new</code> function is a <a href="https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md">const function</a> to allow it in static initializers. As <code>const</code> functions are unstable we need to add the <code>const_fn</code> feature in <code>src/lib.rs</code>.</p> <h3 id="the-text-buffer"><a class="zola-anchor" href="#the-text-buffer" aria-label="Anchor link for: the-text-buffer">🔗</a>The Text Buffer</h3> <p>Now we can add structures to represent a screen character and the text buffer:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[repr(C)] </span><span style="color:#569cd6;">struct </span><span>ScreenChar { </span><span> ascii_character: </span><span style="color:#569cd6;">u8</span><span>, </span><span> color_code: ColorCode, </span><span>} </span><span> </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">BUFFER_HEIGHT</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">25</span><span>; </span><span style="color:#569cd6;">const </span><span style="color:#b4cea8;">BUFFER_WIDTH</span><span>: </span><span style="color:#569cd6;">usize </span><span>= </span><span style="color:#b5cea8;">80</span><span>; </span><span> </span><span style="color:#569cd6;">struct </span><span>Buffer { </span><span> chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], </span><span>} </span></code></pre> <p>Since the field ordering in default structs is undefined in Rust, we need the <a href="https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc">repr(C)</a> attribute. It guarantees that the struct’s fields are laid out exactly like in a C struct and thus guarantees the correct field ordering.</p> <p>To actually write to screen, we now create a writer type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>core::ptr::Unique; </span><span> </span><span style="color:#569cd6;">pub struct </span><span>Writer { </span><span> column_position: </span><span style="color:#569cd6;">usize</span><span>, </span><span> color_code: ColorCode, </span><span> buffer: Unique&lt;Buffer&gt;, </span><span>} </span></code></pre> <p>The writer will always write to the last line and shift lines up when a line is full (or on <code>\n</code>). The <code>column_position</code> field keeps track of the current position in the last row. The current foreground and background colors are specified by <code>color_code</code> and a pointer to the VGA buffer is stored in <code>buffer</code>. To make it possible to create a <code>static</code> Writer later, the <code>buffer</code> field stores an <code>Unique&lt;Buffer&gt;</code> instead of a plain <code>*mut Buffer</code>. <a href="https://doc.rust-lang.org/1.10.0/core/ptr/struct.Unique.html">Unique</a> is a wrapper that implements Send/Sync and is thus usable as a <code>static</code>. Since it’s unstable, you may need to add the <code>unique</code> feature to <code>lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span>#![feature(unique)] </span></code></pre> <h2 id="printing-characters"><a class="zola-anchor" href="#printing-characters" aria-label="Anchor link for: printing-characters">🔗</a>Printing Characters</h2> <p>Now we can use the <code>Writer</code> to modify the buffer’s characters. First we create a method to write a single ASCII byte (it doesn’t compile yet):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>Writer { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>write_byte(</span><span style="color:#569cd6;">&amp;mut </span><span>self, byte: </span><span style="color:#569cd6;">u8</span><span>) { </span><span> </span><span style="color:#569cd6;">match</span><span> byte { </span><span> </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&#39; </span><span style="color:#569cd6;">=&gt; </span><span>self.new_line(), </span><span> byte </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">if </span><span>self.column_position &gt;= </span><span style="color:#b4cea8;">BUFFER_WIDTH </span><span>{ </span><span> self.new_line(); </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> row = </span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>- </span><span style="color:#b5cea8;">1</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> col = self.column_position; </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> color_code = self.color_code; </span><span> self.buffer().chars[row][col] = ScreenChar { </span><span> ascii_character: byte, </span><span> color_code: color_code, </span><span> }; </span><span> self.column_position += </span><span style="color:#b5cea8;">1</span><span>; </span><span> } </span><span> } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>buffer(</span><span style="color:#569cd6;">&amp;mut </span><span>self) -&gt; </span><span style="color:#569cd6;">&amp;mut</span><span> Buffer { </span><span> </span><span style="color:#569cd6;">unsafe</span><span>{ self.buffer.as_mut() } </span><span> } </span><span> </span><span> </span><span style="color:#569cd6;">fn </span><span>new_line(</span><span style="color:#569cd6;">&amp;mut </span><span>self) {</span><span style="color:#608b4e;">/* TODO */</span><span>} </span><span>} </span></code></pre> <p>If the byte is the <a href="https://en.wikipedia.org/wiki/Newline">newline</a> byte <code>\n</code>, the writer does not print anything. Instead it calls a <code>new_line</code> method, which we’ll implement later. Other bytes get printed to the screen in the second match case.</p> <p>When printing a byte, the writer checks if the current line is full. In that case, a <code>new_line</code> call is required before to wrap the line. Then it writes a new <code>ScreenChar</code> to the buffer at the current position. Finally, the current column position is advanced.</p> <p>The <code>buffer()</code> auxiliary method converts the raw pointer in the <code>buffer</code> field into a safe mutable buffer reference. The unsafe block is needed because the <a href="https://doc.rust-lang.org/1.26.0/core/ptr/struct.Unique.html#method.as_mut">as_mut()</a> method of <code>Unique</code> is unsafe. But our <code>buffer()</code> method itself isn’t marked as unsafe, so it must not introduce any unsafety (e.g. cause segfaults). To guarantee that, it’s very important that the <code>buffer</code> field always points to a valid <code>Buffer</code>. It’s like a contract that we must stand to every time we create a <code>Writer</code>. To ensure that it’s not possible to create an invalid <code>Writer</code> from outside of the module, the struct must have at least one private field and public creation functions are not allowed either.</p> <h3 id="cannot-move-out-of-borrowed-content"><a class="zola-anchor" href="#cannot-move-out-of-borrowed-content" aria-label="Anchor link for: cannot-move-out-of-borrowed-content">🔗</a>Cannot Move out of Borrowed Content</h3> <p>When we try to compile it, we get the following error:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0507]: cannot move out of borrowed content </span><span> --&gt; src/vga_buffer.rs:79:34 </span><span> | </span><span>79 | let color_code = self.color_code; </span><span> | ^^^^ cannot move out of borrowed content </span></code></pre> <p>The reason it that Rust <em>moves</em> values by default instead of copying them like other languages. And we cannot move <code>color_code</code> out of <code>self</code> because we only borrowed <code>self</code>. For more information check out the <a href="https://doc.rust-lang.org/book/ownership.html">ownership section</a> in the Rust book.</p> <p>To fix it, we can implement the <a href="https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html">Copy</a> trait for the <code>ColorCode</code> type. The easiest way to do this is to use the built-in <a href="https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html">derive macro</a>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[derive(Debug, Clone, Copy)] </span><span style="color:#569cd6;">struct </span><span>ColorCode(</span><span style="color:#569cd6;">u8</span><span>); </span></code></pre> <p>We also derive the <a href="https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html">Clone</a> trait, since it’s a requirement for <code>Copy</code>, and the <a href="https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html">Debug</a> trait, which allows us to print this field for debugging purposes.</p> <p>Now our project should compile again.</p> <p>However, the <a href="https://doc.rust-lang.org/core/marker/trait.Copy.html#when-should-my-type-be-copy">documentation for Copy</a> says: <em>“if your type can implement Copy, it should”</em>. Therefore we also derive Copy for <code>Color</code> and <code>ScreenChar</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[allow(dead_code)] </span><span>#[derive(Debug, Clone, Copy)] </span><span>#[repr(u8)] </span><span style="color:#569cd6;">pub enum </span><span>Color {</span><span style="color:#569cd6;">...</span><span>} </span><span> </span><span>#[derive(Debug, Clone, Copy)] </span><span>#[repr(C)] </span><span style="color:#569cd6;">struct </span><span>ScreenChar {...} </span></code></pre> <h3 id="try-it-out"><a class="zola-anchor" href="#try-it-out" aria-label="Anchor link for: try-it-out">🔗</a>Try it out!</h3> <p>To write some characters to the screen, you can create a temporary function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub fn </span><span>print_something() { </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::LightGreen, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ Unique::new_unchecked(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut _</span><span>) }, </span><span> }; </span><span> </span><span> writer.write_byte(</span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;H&#39;</span><span>); </span><span>} </span></code></pre> <p>It just creates a new Writer that points to the VGA buffer at <code>0xb8000</code>. To use the unstable <code>Unique::new_unchecked</code> function, we need to add the feature flag <code>#![feature(const_unique_new)]</code> to the top of our <code>src/lib.rs</code>.</p> <p>Then it writes the byte <code>b'H'</code> to it. The <code>b</code> prefix creates a <a href="https://doc.rust-lang.org/reference/tokens.html#characters-and-strings">byte character</a>, which represents an ASCII code point. When we call <code>vga_buffer::print_something</code> in main, a <code>H</code> should be printed in the <em>lower</em> left corner of the screen in light green:</p> <p><img src="https://os.phil-opp.com/printing-to-screen/vga-H-lower-left.png" alt="QEMU output with a green H in the lower left corner" /></p> <h3 id="volatile"><a class="zola-anchor" href="#volatile" aria-label="Anchor link for: volatile">🔗</a>Volatile</h3> <p>We just saw that our <code>H</code> was printed correctly. However, it might not work with future Rust compilers that optimize more aggressively.</p> <p>The problem is that we only write to the <code>Buffer</code> and never read from it again. The compiler doesn’t know about the side effect that some characters appear on the screen. So it might decide that these writes are unnecessary and can be omitted.</p> <p>To avoid this erroneous optimization, we need to specify these writes as <em><a href="https://en.wikipedia.org/wiki/Volatile_(computer_programming)">volatile</a></em>. This tells the compiler that the write has side effects and should not be optimized away.</p> <p>In order to use volatile writes for the VGA buffer, we use the <a href="https://docs.rs/volatile">volatile</a> library. This <em>crate</em> (this is how packages are called in the Rust world) provides a <code>Volatile</code> wrapper type with <code>read</code> and <code>write</code> methods. These methods internally use the <a href="https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html">read_volatile</a> and <a href="https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html">write_volatile</a> functions of the standard library and thus guarantee that the reads/writes are not optimized away.</p> <p>We can add a dependency on the <code>volatile</code> crate by adding it to the <code>dependencies</code> section of our <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span> </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">volatile </span><span>= </span><span style="color:#d69d85;">&quot;0.1.0&quot; </span></code></pre> <p>The <code>0.1.0</code> is the <a href="https://semver.org/">semantic</a> version number. For more information, see the <a href="https://doc.crates.io/specifying-dependencies.html">Specifying Dependencies</a> guide of the cargo documentation.</p> <p>Now we’ve declared that our project depends on the <code>volatile</code> crate and are able to import it in <code>src/lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span style="color:#569cd6;">extern crate</span><span> volatile; </span></code></pre> <p>Let’s use it to make writes to the VGA buffer volatile. We update our <code>Buffer</code> type as follows:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span style="color:#569cd6;">use </span><span>volatile::Volatile; </span><span> </span><span style="color:#569cd6;">struct </span><span>Buffer { </span><span> chars: [[Volatile&lt;ScreenChar&gt;; BUFFER_WIDTH]; BUFFER_HEIGHT], </span><span>} </span></code></pre> <p>Instead of a <code>ScreenChar</code>, we’re now using a <code>Volatile&lt;ScreenChar&gt;</code>. (The <code>Volatile</code> type is <a href="https://doc.rust-lang.org/book/second-edition/ch10-00-generics.html">generic</a> and can wrap (almost) any type). This ensures that we can’t accidentally write to it through a “normal” write. Instead, we have to use the <code>write</code> method now.</p> <p>This means that we have to update our <code>Writer::write_byte</code> method:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">impl </span><span>Writer { </span><span> </span><span style="color:#569cd6;">pub fn </span><span>write_byte(</span><span style="color:#569cd6;">&amp;mut </span><span>self, byte: </span><span style="color:#569cd6;">u8</span><span>) { </span><span> </span><span style="color:#569cd6;">match</span><span> byte { </span><span> </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&#39; </span><span style="color:#569cd6;">=&gt; </span><span>self.new_line(), </span><span> byte </span><span style="color:#569cd6;">=&gt; </span><span>{ </span><span> </span><span style="color:#569cd6;">... </span><span> </span><span> self.buffer().chars[row][col].write(ScreenChar { </span><span> ascii_character: byte, </span><span> color_code: color_code, </span><span> }); </span><span> </span><span style="color:#569cd6;">... </span><span> } </span><span> } </span><span> } </span><span> </span><span style="color:#569cd6;">... </span><span>} </span></code></pre> <p>Instead of a normal assignment using <code>=</code>, we’re now using the <code>write</code> method. This guarantees that the compiler will never optimize away this write.</p> <h2 id="printing-strings"><a class="zola-anchor" href="#printing-strings" aria-label="Anchor link for: printing-strings">🔗</a>Printing Strings</h2> <p>To print whole strings, we can convert them to bytes and print them one-by-one:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `impl Writer` </span><span style="color:#569cd6;">pub fn </span><span>write_str(</span><span style="color:#569cd6;">&amp;mut </span><span>self, s: </span><span style="color:#569cd6;">&amp;str</span><span>) { </span><span> </span><span style="color:#569cd6;">for</span><span> byte </span><span style="color:#569cd6;">in</span><span> s.bytes() { </span><span> self.write_byte(byte) </span><span> } </span><span>} </span></code></pre> <p>You can try it yourself in the <code>print_something</code> function.</p> <p>When you print strings with some special characters like <code>ä</code> or <code>λ</code>, you’ll notice that they cause weird symbols on screen. That’s because they are represented by multiple bytes in <a href="https://www.fileformat.info/info/unicode/utf8.htm">UTF-8</a>. By converting them to bytes, we of course get strange results. But since the VGA buffer doesn’t support UTF-8, it’s not possible to display these characters anyway.</p> <h3 id="support-formatting-macros"><a class="zola-anchor" href="#support-formatting-macros" aria-label="Anchor link for: support-formatting-macros">🔗</a>Support Formatting Macros</h3> <p>It would be nice to support Rust’s formatting macros, too. That way, we can easily print different types like integers or floats. To support them, we need to implement the <a href="https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html">core::fmt::Write</a> trait. The only required method of this trait is <code>write_str</code> that looks quite similar to our <code>write_str</code> method. To implement the trait, we just need to move it into an <code>impl fmt::Write for Writer</code> block and add a return type:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">use </span><span>core::fmt; </span><span> </span><span style="color:#569cd6;">impl </span><span>fmt::Write </span><span style="color:#569cd6;">for </span><span>Writer { </span><span> </span><span style="color:#569cd6;">fn </span><span>write_str(</span><span style="color:#569cd6;">&amp;mut </span><span>self, s: </span><span style="color:#569cd6;">&amp;str</span><span>) -&gt; fmt::Result { </span><span> </span><span style="color:#569cd6;">for</span><span> byte </span><span style="color:#569cd6;">in</span><span> s.bytes() { </span><span> self.write_byte(byte) </span><span> } </span><span> Ok(()) </span><span> } </span><span>} </span></code></pre> <p>The <code>Ok(())</code> is just a <code>Ok</code> Result containing the <code>()</code> type. We can drop the <code>pub</code> because trait methods are always public.</p> <p>Now we can use Rust’s built-in <code>write!</code>/<code>writeln!</code> formatting macros:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in the `print_something` function </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span style="color:#569cd6;">let mut</span><span> writer = Writer {</span><span style="color:#569cd6;">...</span><span>}; </span><span>writer.write_byte(</span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39;H&#39;</span><span>); </span><span>writer.write_str(</span><span style="color:#d69d85;">&quot;ello! &quot;</span><span>); </span><span>write!(writer, </span><span style="color:#d69d85;">&quot;The numbers are </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;"> and </span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#b5cea8;">42</span><span>, </span><span style="color:#b5cea8;">1.0</span><span>/</span><span style="color:#b5cea8;">3.0</span><span>); </span></code></pre> <p>Now you should see a <code>Hello! The numbers are 42 and 0.3333333333333333</code> at the bottom of the screen.</p> <h3 id="newlines"><a class="zola-anchor" href="#newlines" aria-label="Anchor link for: newlines">🔗</a>Newlines</h3> <p>Right now, we just ignore newlines and characters that don’t fit into the line anymore. Instead we want to move every character one line up (the top line gets deleted) and start at the beginning of the last line again. To do this, we add an implementation for the <code>new_line</code> method of <code>Writer</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `impl Writer` </span><span> </span><span style="color:#569cd6;">fn </span><span>new_line(</span><span style="color:#569cd6;">&amp;mut </span><span>self) { </span><span> </span><span style="color:#569cd6;">for</span><span> row </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">1</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>{ </span><span> </span><span style="color:#569cd6;">for</span><span> col </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">BUFFER_WIDTH </span><span>{ </span><span> </span><span style="color:#569cd6;">let</span><span> buffer = self.buffer(); </span><span> </span><span style="color:#569cd6;">let</span><span> character = buffer.chars[row][col].read(); </span><span> buffer.chars[row - </span><span style="color:#b5cea8;">1</span><span>][col].write(character); </span><span> } </span><span> } </span><span> self.clear_row(</span><span style="color:#b4cea8;">BUFFER_HEIGHT</span><span>-</span><span style="color:#b5cea8;">1</span><span>); </span><span> self.column_position = </span><span style="color:#b5cea8;">0</span><span>; </span><span>} </span><span> </span><span style="color:#569cd6;">fn </span><span>clear_row(</span><span style="color:#569cd6;">&amp;mut </span><span>self, row: </span><span style="color:#569cd6;">usize</span><span>) {</span><span style="color:#608b4e;">/* TODO */</span><span>} </span></code></pre> <p>We iterate over all screen characters and move each characters one row up. Note that the range notation (<code>..</code>) is exclusive the upper bound. We also omit the 0th row (the first range starts at <code>1</code>) because it’s the row that is shifted off screen.</p> <p>Now we only need to implement the <code>clear_row</code> method to finish the newline code:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in `impl Writer` </span><span style="color:#569cd6;">fn </span><span>clear_row(</span><span style="color:#569cd6;">&amp;mut </span><span>self, row: </span><span style="color:#569cd6;">usize</span><span>) { </span><span> </span><span style="color:#569cd6;">let</span><span> blank = ScreenChar { </span><span> ascii_character: </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&#39; &#39;</span><span>, </span><span> color_code: self.color_code, </span><span> }; </span><span> </span><span style="color:#569cd6;">for</span><span> col </span><span style="color:#569cd6;">in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">BUFFER_WIDTH </span><span>{ </span><span> self.buffer().chars[row][col].write(blank); </span><span> } </span><span>} </span></code></pre> <p>This method clears a row by overwriting all of its characters with a space character.</p> <h2 id="providing-an-interface"><a class="zola-anchor" href="#providing-an-interface" aria-label="Anchor link for: providing-an-interface">🔗</a>Providing an Interface</h2> <p>To provide a global writer that can used as an interface from other modules, we can add a <code>static</code> writer:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub static </span><span style="color:#b4cea8;">WRITER</span><span>: Writer = Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::LightGreen, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ Unique::new_unchecked(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut _</span><span>) }, </span><span>}; </span></code></pre> <p>But we can’t use it to print anything! You can try it yourself in the <code>print_something</code> function. The reason is that we try to take a mutable reference (<code>&amp;mut</code>) to a immutable <code>static</code> when calling <code>WRITER.print_byte</code>.</p> <p>To resolve it, we could use a <a href="https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable">mutable static</a>. But then every read and write to it would be unsafe since it could easily introduce data races and other bad things. Using <code>static mut</code> is highly discouraged, there are even proposals to <a href="https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437">remove it</a>.</p> <p>But what are the alternatives? We could try to use a cell type like <a href="https://doc.rust-lang.org/nightly/core/cell/struct.RefCell.html">RefCell</a> or even <a href="https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html">UnsafeCell</a> to provide <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/mutability.html#interior-vs-exterior-mutability">interior mutability</a>. But these types aren’t <a href="https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html">Sync</a> (with good reason), so we can’t use them in statics.</p> <p>To get synchronized interior mutability, users of the standard library can use <a href="https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html">Mutex</a>. It provides mutual exclusion by blocking threads when the resource is already locked. But our basic kernel does not have any blocking support or even a concept of threads, so we can’t use it either. However there is a really basic kind of mutex in computer science that requires no operating system features: the <a href="https://en.wikipedia.org/wiki/Spinlock">spinlock</a>. Instead of blocking, the threads simply try to lock it again and again in a tight loop and thus burn CPU time until the mutex is free again.</p> <p>To use a spinning mutex, we can add the <a href="https://crates.io/crates/spin">spin crate</a> as a dependency:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#608b4e;"># in Cargo.toml </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">rlibc </span><span>= </span><span style="color:#d69d85;">&quot;0.1.4&quot; </span><span style="color:#569cd6;">spin </span><span>= </span><span style="color:#d69d85;">&quot;0.4.5&quot; </span></code></pre> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">extern crate</span><span> spin; </span></code></pre> <p>Then we can use the spinning Mutex to add interior mutability to our static writer:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs again </span><span style="color:#569cd6;">use </span><span>spin::Mutex; </span><span style="color:#569cd6;">... </span><span style="color:#569cd6;">pub static </span><span style="color:#b4cea8;">WRITER</span><span>: Mutex&lt;Writer&gt; = Mutex::new(Writer { </span><span> column_position: </span><span style="color:#b5cea8;">0</span><span>, </span><span> color_code: ColorCode::new(Color::LightGreen, Color::Black), </span><span> buffer: </span><span style="color:#569cd6;">unsafe </span><span>{ Unique::new_unchecked(</span><span style="color:#b5cea8;">0xb8000 </span><span style="color:#569cd6;">as *mut _</span><span>) }, </span><span>}); </span></code></pre> <p><a href="https://docs.rs/spin/0.4.5/spin/struct.Mutex.html#method.new">Mutex::new</a> is a const function, too, so it can be used in statics.</p> <p>Now we can easily print from our main function:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span style="color:#569cd6;">pub extern fn </span><span>rust_main() { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> vga_buffer::</span><span style="color:#b4cea8;">WRITER</span><span>.lock().write_str(</span><span style="color:#d69d85;">&quot;Hello again&quot;</span><span>); </span><span> write!(vga_buffer::WRITER.lock(), </span><span style="color:#d69d85;">&quot;, some numbers: </span><span style="color:#b4cea8;">{} {}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#b5cea8;">42</span><span>, </span><span style="color:#b5cea8;">1.337</span><span>); </span><span> </span><span style="color:#569cd6;">loop</span><span>{} </span><span>} </span></code></pre> <p>Note that we need to import the <code>Write</code> trait if we want to use its functions.</p> <h2 id="a-println-macro"><a class="zola-anchor" href="#a-println-macro" aria-label="Anchor link for: a-println-macro">🔗</a>A println macro</h2> <p>Rust’s <a href="https://doc.rust-lang.org/nightly/book/second-edition/appendix-04-macros.html">macro syntax</a> is a bit strange, so we won’t try to write a macro from scratch. Instead we look at the source of the <a href="https://doc.rust-lang.org/nightly/std/macro.println!.html"><code>println!</code> macro</a> in the standard library:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>macro_rules! println { </span><span> ($fmt:</span><span style="color:#569cd6;">expr</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>(print!(concat!($fmt, </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>))); </span><span> ($fmt:</span><span style="color:#569cd6;">expr</span><span>, </span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>(print!(concat!($fmt, </span><span style="color:#d69d85;">&quot;</span><span style="color:#e3bbab;">\n</span><span style="color:#d69d85;">&quot;</span><span>), </span><span style="color:#569cd6;">$</span><span>($arg)*)); </span><span>} </span></code></pre> <p>Macros are defined through one or more rules, which are similar to <code>match</code> arms. The <code>println</code> macro has two rules: The first rule is for invocations with a single argument (e.g. <code>println!("Hello")</code>) and the second rule is for invocations with additional parameters (e.g. <code>println!("{}{}", 4, 2)</code>).</p> <p>Both rules simply append a newline character (<code>\n</code>) to the format string and then invoke the <a href="https://doc.rust-lang.org/nightly/std/macro.print!.html"><code>print!</code> macro</a>, which is defined as:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>macro_rules! print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>($crate::io::_print(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*))); </span><span>} </span></code></pre> <p>The macro expands to a call of the <a href="https://github.com/rust-lang/rust/blob/46d39f3329487115e7d7dcd37bc64eea6ef9ba4e/src/libstd/io/stdio.rs#L631"><code>_print</code> function</a> in the <code>io</code> module. The <a href="https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate"><code>$crate</code> variable</a> ensures that the macro also works from outside the <code>std</code> crate. For example, it expands to <code>::std</code> when it’s used in other crates.</p> <p>The <a href="https://doc.rust-lang.org/nightly/std/macro.format_args.html"><code>format_args</code> macro</a> builds a <a href="https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html">fmt::Arguments</a> type from the passed arguments, which is passed to <code>_print</code>. The <a href="https://github.com/rust-lang/rust/blob/46d39f3329487115e7d7dcd37bc64eea6ef9ba4e/src/libstd/io/stdio.rs#L631"><code>_print</code> function</a> of libstd is rather complicated, as it supports different <code>Stdout</code> devices. We don’t need that complexity since we just want to print to the VGA buffer.</p> <p>To print to the VGA buffer, we just copy the <code>println!</code> macro and modify the <code>print!</code> macro to use our static <code>WRITER</code> instead of <code>_print</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span>macro_rules! print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>({ </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = $crate::vga_buffer::</span><span style="color:#b4cea8;">WRITER</span><span>.lock(); </span><span> writer.write_fmt(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*)).unwrap(); </span><span> }); </span><span>} </span></code></pre> <p>Instead of a <code>_print</code> function, we call the <code>write_fmt</code> method of our static <code>Writer</code>. Since we’re using a method from the <code>Write</code> trait, we need to import it before. The additional <code>unwrap()</code> at the end panics if printing isn’t successful. But since we always return <code>Ok</code> in <code>write_str</code>, that should not happen.</p> <p>Note the additional <code>{}</code> scope around the macro: We write <code>=&gt; ({…})</code> instead of <code>=&gt; (…)</code>. The additional <code>{}</code> avoids that the <code>Write</code> trait is silently imported to the parent scope when <code>print</code> is used.</p> <h3 id="clearing-the-screen"><a class="zola-anchor" href="#clearing-the-screen" aria-label="Anchor link for: clearing-the-screen">🔗</a>Clearing the screen</h3> <p>We can now use <code>println!</code> to add a rather trivial function to clear the screen:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span style="color:#569cd6;">pub fn </span><span>clear_screen() { </span><span> </span><span style="color:#569cd6;">for _ in </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b4cea8;">BUFFER_HEIGHT </span><span>{ </span><span> println!(</span><span style="color:#d69d85;">&quot;&quot;</span><span>); </span><span> } </span><span>} </span></code></pre> <h3 id="hello-world-using-println"><a class="zola-anchor" href="#hello-world-using-println" aria-label="Anchor link for: hello-world-using-println">🔗</a>Hello World using <code>println</code></h3> <p>To use <code>println</code> in <code>lib.rs</code>, we need to import the macros of the VGA buffer module first. Therefore we add a <code>#[macro_use]</code> attribute to the module declaration:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/lib.rs </span><span> </span><span>#[macro_use] </span><span style="color:#569cd6;">mod </span><span>vga_buffer; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>rust_main() { </span><span> </span><span style="color:#608b4e;">// ATTENTION: we have a very small stack and no guard page </span><span> vga_buffer::clear_screen(); </span><span> println!(</span><span style="color:#d69d85;">&quot;Hello World</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>); </span><span> </span><span> </span><span style="color:#569cd6;">loop</span><span>{} </span><span>} </span></code></pre> <p>Since we imported the macros at crate level, they are available in all modules and thus provide an easy and safe interface to the VGA buffer.</p> <p>As expected, we now see a <em>“Hello World!”</em> on a cleared screen:</p> <p><img src="https://os.phil-opp.com/printing-to-screen/vga-hello-world.png" alt="QEMU printing “Hello World!” on a cleared screen" /></p> <h3 id="deadlocks"><a class="zola-anchor" href="#deadlocks" aria-label="Anchor link for: deadlocks">🔗</a>Deadlocks</h3> <p>Whenever we use locks, we must be careful to not accidentally introduce <em>deadlocks</em>. A <a href="https://en.wikipedia.org/wiki/Deadlock">deadlock</a> occurs when a thread/program waits for a lock that will never be released. Normally, this happens when multiple threads access multiple locks. For example, when thread A holds lock 1 and tries to acquire lock 2 and – at the same time – thread B holds lock 2 and tries to acquire lock 1.</p> <p>However, a deadlock can also occur when a thread tries to acquire the same lock twice. This way we can trigger a deadlock in our VGA driver:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in rust_main in src/lib.rs </span><span> </span><span>println!(</span><span style="color:#d69d85;">&quot;</span><span style="color:#b4cea8;">{}</span><span style="color:#d69d85;">&quot;</span><span>, { println!(</span><span style="color:#d69d85;">&quot;inner&quot;</span><span>); </span><span style="color:#d69d85;">&quot;outer&quot; </span><span>}); </span></code></pre> <p>The argument passed to <code>println</code> is new block that resolves to the string <em>“outer”</em> (a block always returns the result of the last expression). But before returning “outer”, the block tries to print the string <em>“inner”</em>.</p> <p>When we try this code in QEMU, we see that neither of the strings are printed. To understand what’s happening, we take a look at our <code>print</code> macro again:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>macro_rules! print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>({ </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#569cd6;">let mut</span><span> writer = $crate::vga_buffer::</span><span style="color:#b4cea8;">WRITER</span><span>.lock(); </span><span> writer.write_fmt(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*)).unwrap(); </span><span> }); </span><span>} </span></code></pre> <p>So we <em>first</em> lock the <code>WRITER</code> and then we evaluate the arguments using <code>format_args</code>. The problem is that the argument in our code example contains another <code>println</code>, which tries to lock the <code>WRITER</code> again. So now the inner <code>println</code> waits for the outer <code>println</code> and vice versa. Thus, a deadlock occurs and the CPU spins endlessly.</p> <h3 id="fixing-the-deadlock"><a class="zola-anchor" href="#fixing-the-deadlock" aria-label="Anchor link for: fixing-the-deadlock">🔗</a>Fixing the Deadlock</h3> <p>In order to fix the deadlock, we need to evaluate the arguments <em>before</em> locking the <code>WRITER</code>. We can do so by moving the locking and printing logic into a new <code>print</code> function (like it’s done in the standard library):</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#608b4e;">// in src/vga_buffer.rs </span><span> </span><span>macro_rules! print { </span><span> (</span><span style="color:#569cd6;">$</span><span>($arg:</span><span style="color:#569cd6;">tt</span><span>)</span><span style="color:#569cd6;">*</span><span>) </span><span style="color:#569cd6;">=&gt; </span><span>({ </span><span> $crate::vga_buffer::print(format_args!(</span><span style="color:#569cd6;">$</span><span>($arg)*)); </span><span> }); </span><span>} </span><span> </span><span style="color:#569cd6;">pub fn </span><span>print(args: fmt::Arguments) { </span><span> </span><span style="color:#569cd6;">use </span><span>core::fmt::Write; </span><span> </span><span style="color:#b4cea8;">WRITER</span><span>.lock().write_fmt(args).unwrap(); </span><span>} </span></code></pre> <p>Now the macro only evaluates the arguments (through <code>format_args!</code>) and passes them to the new <code>print</code> function. The <code>print</code> function then locks the <code>WRITER</code> and prints the formatting arguments using <code>write_fmt</code>. So now the arguments are evaluated before locking the <code>WRITER</code>.</p> <p>Thus, we fixed the deadlock:</p> <p><img src="https://os.phil-opp.com/printing-to-screen/fixed-println-deadlock.png" alt="QEMU printing “inner” and then “outer”" /></p> <p>We see that both “inner” and “outer” are printed.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>In the next posts we will map the kernel pages correctly so that accessing <code>0x0</code> or writing to <code>.rodata</code> is not possible anymore. To obtain the loaded kernel sections we will read the Multiboot information structure. Then we will create a paging module and use it to switch to a new page table where the kernel sections are mapped correctly.</p> <p>The <a href="https://os.phil-opp.com/allocating-frames/">next post</a> describes the Multiboot information structure and creates a frame allocator using the information about memory areas.</p> <h2 id="other-rust-os-projects"><a class="zola-anchor" href="#other-rust-os-projects" aria-label="Anchor link for: other-rust-os-projects">🔗</a>Other Rust OS Projects</h2> <p>Now that you know the very basics of OS development in Rust, you should also check out the following projects:</p> <ul> <li> <p><a href="https://github.com/thepowersgang/rust-barebones-kernel">Rust Bare-Bones Kernel</a>: A basic kernel with roughly the same functionality as ours. Writes output to the serial port instead of the VGA buffer and maps the kernel to the <a href="https://wiki.osdev.org/Higher_Half_Kernel">higher half</a> (instead of our identity mapping). <em>Note</em>: You need to <a href="https://os.phil-opp.com/cross-compile-binutils/">cross compile binutils</a> to build it (or you create some symbolic links<sup class="footnote-reference"><a href="#fn-symlink">1</a></sup> if you’re on x86_64).</p> </li> <li> <p><a href="https://github.com/RustOS-Fork-Holding-Ground/RustOS">RustOS</a>: More advanced kernel that supports allocation, keyboard inputs, and threads. It also has a scheduler and a basic network driver.</p> </li> <li> <p><a href="https://github.com/thepowersgang/rust_os">“Tifflin” Experimental Kernel</a>: Big kernel project by thepowersgang, that is actively developed and has over 650 commits. It has a separate userspace and supports multiple file systems, even a GUI is included. Needs a cross compiler.</p> </li> <li> <p><a href="https://github.com/redox-os/redox">Redox</a>: Probably the most complete Rust OS today. It has an active community and over 1000 Github stars. File systems, network, an audio player, a picture viewer, and much more. Just take a look at the <a href="https://github.com/redox-os/redox#what-it-looks-like">screenshots</a>.</p> </li> </ul> <h2 id="footnotes"><a class="zola-anchor" href="#footnotes" aria-label="Anchor link for: footnotes">🔗</a>Footnotes</h2> <div class="footnote-definition" id="fn-symlink"><sup class="footnote-definition-label">1</sup> <p>You will need to symlink <code>x86_64-none_elf-XXX</code> to <code>/usr/bin/XXX</code> where <code>XXX</code> is in {<code>as</code>, <code>ld</code>, <code>objcopy</code>, <code>objdump</code>, <code>strip</code>}. The <code>x86_64-none_elf-XXX</code> files must be in some folder that is in your <code>$PATH</code>. But then you can only build for your x86_64 host architecture, so use this hack only for testing.</p> </div> Set Up Rust Wed, 02 Sep 2015 00:00:00 +0000 https://os.phil-opp.com/set-up-rust/ https://os.phil-opp.com/set-up-rust/ <p>In the previous posts we created a <a href="https://os.phil-opp.com/multiboot-kernel/">minimal Multiboot kernel</a> and <a href="https://os.phil-opp.com/entering-longmode/">switched to Long Mode</a>. Now we can finally switch to <a href="https://www.rust-lang.org/">Rust</a> code. Rust is a high-level language without runtime. It allows us to not link the standard library and write bare metal code. Unfortunately the setup is not quite hassle-free yet.</p> <span id="continue-reading"></span> <p>This blog post tries to set up Rust step-by-step and point out the different problems. If you have any questions, problems, or suggestions please <a href="https://github.com/phil-opp/blog_os/issues">file an issue</a> or create a comment at the bottom. The code from this post is in a <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_3">Github repository</a>, too.</p> <h2 id="installing-rust"><a class="zola-anchor" href="#installing-rust" aria-label="Anchor link for: installing-rust">🔗</a>Installing Rust</h2> <p>We need a nightly compiler, as we will use many unstable features. To manage Rust installations I highly recommend <a href="https://www.rustup.rs/">rustup</a>. It allows you to install nightly, beta, and stable compilers side-by-side and makes it easy to update them. To use a nightly compiler for the current directory, you can run <code>rustup override add nightly</code>. Alternatively, you can add a file called <code>rust-toolchain</code> to the project’s root directory:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>nightly </span></code></pre> <h2 id="creating-a-cargo-project"><a class="zola-anchor" href="#creating-a-cargo-project" aria-label="Anchor link for: creating-a-cargo-project">🔗</a>Creating a Cargo project</h2> <p><a href="https://doc.crates.io/guide.html">Cargo</a> is Rust’s excellent package manager. Normally you would call <code>cargo new</code> when you want to create a new project folder. We can’t use it because our folder already exists, so we need to do it manually. Fortunately we only need to add a cargo configuration file named <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#808080;">package</span><span>] </span><span style="color:#569cd6;">name </span><span>= </span><span style="color:#d69d85;">&quot;blog_os&quot; </span><span style="color:#569cd6;">version </span><span>= </span><span style="color:#d69d85;">&quot;0.1.0&quot; </span><span style="color:#569cd6;">authors </span><span>= [</span><span style="color:#d69d85;">&quot;Philipp Oppermann &lt;[email protected]&gt;&quot;</span><span>] </span><span> </span><span>[</span><span style="color:#808080;">lib</span><span>] </span><span style="color:#569cd6;">crate-type </span><span>= [</span><span style="color:#d69d85;">&quot;staticlib&quot;</span><span>] </span></code></pre> <p>The <code>package</code> section contains required project metadata such as the <a href="https://doc.rust-lang.org/cargo/reference/manifest.html#the-package-section">semantic crate version</a>. The <code>lib</code> section specifies that we want to build a static library, i.e. a library that contains all of its dependencies. This is required to link the Rust project with our kernel.</p> <p>Now we place our root source file in <code>src/lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#![feature(lang_items)] </span><span>#![no_std] </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>rust_main() {} </span><span> </span><span>#[lang </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;eh_personality&quot;</span><span>] #[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>eh_personality() {} </span><span>#[lang </span><span style="color:#569cd6;">= </span><span style="color:#d69d85;">&quot;panic_fmt&quot;</span><span>] #[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>panic_fmt() -&gt; </span><span style="color:#569cd6;">! </span><span>{</span><span style="color:#569cd6;">loop</span><span>{}} </span></code></pre> <p>Let’s break it down:</p> <ul> <li><code>#!</code> defines an <a href="https://doc.rust-lang.org/book/attributes.html">attribute</a> of the current module. Since we are at the root module, the attributes apply to the crate itself.</li> <li>The <code>feature</code> attribute is used to allow the specified <em>feature-gated</em> attributes in this crate. You can’t do that in a stable/beta compiler, so this is one reason we need a Rust nighly.</li> <li>The <code>no_std</code> attribute prevents the automatic linking of the standard library. We can’t use <code>std</code> because it relies on operating system features like files, system calls, and various device drivers. Remember that currently the only “feature” of our OS is printing <code>OKAY</code> :).</li> <li>A <code>#</code> without a <code>!</code> afterwards defines an attribute for the <em>following</em> item (a function in our case).</li> <li>The <code>no_mangle</code> attribute disables the automatic <a href="https://en.wikipedia.org/wiki/Name_mangling">name mangling</a> that Rust uses to get unique function names. We want to do a <code>call rust_main</code> from our assembly code, so this function name must stay as it is.</li> <li>We mark our main function as <code>extern</code> to make it compatible to the standard C <a href="https://en.wikipedia.org/wiki/Calling_convention">calling convention</a>.</li> <li>The <code>lang</code> attribute defines a Rust <a href="https://doc.rust-lang.org/1.10.0/book/lang-items.html">language item</a>.</li> <li>The <code>eh_personality</code> function is used for Rust’s <a href="https://doc.rust-lang.org/nomicon/unwinding.html">unwinding</a> on <code>panic!</code>. We can leave it empty since we don’t have any unwinding support in our OS yet.</li> <li>The <code>panic_fmt</code> function is the entry point on panic. Right now we can’t do anything useful, so we just make sure that it doesn’t return (required by the <code>!</code> return type).</li> </ul> <h2 id="building-rust"><a class="zola-anchor" href="#building-rust" aria-label="Anchor link for: building-rust">🔗</a>Building Rust</h2> <p>We can now build it using <code>cargo build</code>, which creates a static library at <code>target/debug/libblog_os.a</code>. However, the resulting library is specific to our <em>host</em> operating system. This is undesirable, because our target system might be different.</p> <p>Let’s define some properties of our target system:</p> <ul> <li><strong>x86_64</strong>: Our target CPU is a recent <code>x86_64</code> CPU.</li> <li><strong>No operating system</strong>: Our target does not run any operating system (we’re currently writing it), so the compiler should not assume any OS-specific functionality.</li> <li><strong>Handles hardware interrupts</strong>: We’re writing a kernel, so we’ll need to handle asynchronous hardware interrupts at some point. This means that we have to disable a certain stack pointer optimization (the so-called <a href="https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone">red zone</a>), because it would cause stack corruptions otherwise.</li> <li><strong>No SSE</strong>: Our target might not have <a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a> support. Even if it does, we probably don’t want to use SSE instructions in our kernel, because it makes interrupt handling much slower. We will explain this in detail in the <a href="https://os.phil-opp.com/handling-exceptions/">“Handling Exceptions”</a> post.</li> <li><strong>No hardware floats</strong>: The <code>x86_64</code> architecture uses SSE instructions for floating point operations, which we don’t want to use (see the previous point). So we also need to avoid hardware floating point operations in our kernel. Instead, we will use <em>soft floats</em>, which are basically software functions that emulate floating point operations using normal integers.</li> </ul> <h3 id="target-specifications"><a class="zola-anchor" href="#target-specifications" aria-label="Anchor link for: target-specifications">🔗</a>Target Specifications</h3> <p>Rust allows us to define <a href="https://doc.rust-lang.org/1.1.0/rustc_back/target/">custom targets</a> through a JSON configuration file. A minimal target specification equal to <code>x86_64-unknown-linux-gnu</code> (the default 64-bit Linux target) looks like this:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-linux-gnu&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;data-layout&quot;</span><span>: </span><span style="color:#d69d85;">&quot;e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;linker-flavor&quot;</span><span>: </span><span style="color:#d69d85;">&quot;gcc&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-endian&quot;</span><span>: </span><span style="color:#d69d85;">&quot;little&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-pointer-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-c-int-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;32&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;arch&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;os&quot;</span><span>: </span><span style="color:#d69d85;">&quot;linux&quot; </span><span>} </span></code></pre> <p>The <code>llvm-target</code> field specifies the target triple that is passed to LLVM. <a href="https://llvm.org/docs/LangRef.html#target-triple">Target triples</a> are a naming convention that define the CPU architecture (e.g., <code>x86_64</code> or <code>arm</code>), the vendor (e.g., <code>apple</code> or <code>unknown</code>), the operating system (e.g., <code>windows</code> or <code>linux</code>), and the <a href="https://en.wikipedia.org/wiki/Application_binary_interface">ABI</a> (e.g., <code>gnu</code> or <code>msvc</code>). For example, the target triple for 64-bit Linux is <code>x86_64-unknown-linux-gnu</code> and for 32-bit Windows the target triple is <code>i686-pc-windows-msvc</code>.</p> <p>The <code>data-layout</code> field is also passed to LLVM and specifies how data should be laid out in memory. It consists of various specifications separated by a <code>-</code> character. For example, the <code>e</code> means little endian and <code>S128</code> specifies that the stack should be 128 bits (= 16 byte) aligned. The format is described in detail in the <a href="https://llvm.org/docs/LangRef.html#data-layout">LLVM documentation</a> but there shouldn’t be a reason to change this string.</p> <p>The <code>linker-flavor</code> field was recently introduced in <a href="https://github.com/rust-lang/rust/pull/40018">#40018</a> with the intention to add support for the LLVM linker <a href="https://lld.llvm.org/">LLD</a>, which is platform independent. In the future, this might allow easy cross compilation without the need to install a gcc cross compiler for linking.</p> <p>The other fields are used for conditional compilation. This allows crate authors to use <code>cfg</code> variables to write special code for depending on the OS or the architecture. There isn’t any up-to-date documentation about these fields but the <a href="https://github.com/rust-lang/rust/blob/c772948b687488a087356cb91432425662e034b9/src/librustc_back/target/mod.rs#L194-L214">corresponding source code</a> is quite readable.</p> <h3 id="a-kernel-target-specification"><a class="zola-anchor" href="#a-kernel-target-specification" aria-label="Anchor link for: a-kernel-target-specification">🔗</a>A Kernel Target Specification</h3> <p>For our target system, we define the following JSON configuration in a file named <code>x86_64-blog_os.json</code>:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;llvm-target&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64-unknown-none&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;data-layout&quot;</span><span>: </span><span style="color:#d69d85;">&quot;e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;linker-flavor&quot;</span><span>: </span><span style="color:#d69d85;">&quot;gcc&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-endian&quot;</span><span>: </span><span style="color:#d69d85;">&quot;little&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-pointer-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;target-c-int-width&quot;</span><span>: </span><span style="color:#d69d85;">&quot;32&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;arch&quot;</span><span>: </span><span style="color:#d69d85;">&quot;x86_64&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;os&quot;</span><span>: </span><span style="color:#d69d85;">&quot;none&quot;</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;disable-redzone&quot;</span><span>: </span><span style="color:#569cd6;">true</span><span>, </span><span> </span><span style="color:#d69d85;">&quot;features&quot;</span><span>: </span><span style="color:#d69d85;">&quot;-mmx,-sse,+soft-float&quot; </span><span>} </span></code></pre> <p>As <code>llvm-target</code> we use <code>x86_64-unknown-none</code>, which defines the <code>x86_64</code> architecture, an <code>unknown</code> vendor, and no operating system (<code>none</code>). The ABI doesn’t matter for us, so we just leave it off. The <code>data-layout</code> field is just copied from the <code>x86_64-unknown-linux-gnu</code> target. We also use the same values for the <code>target-endian</code>, <code>target-pointer-width</code>, <code>target-c-int-width</code>, and <code>arch</code> fields. For the <code>os</code> field we choose <code>none</code>, since our kernel runs on bare metal.</p> <h4 id="the-red-zone"><a class="zola-anchor" href="#the-red-zone" aria-label="Anchor link for: the-red-zone">🔗</a>The Red Zone</h4> <p>The <a href="https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone">red zone</a> is an optimization of the <a href="https://wiki.osdev.org/System_V_ABI">System V ABI</a> that allows functions to temporary use the 128 bytes below its stack frame without adjusting the stack pointer:</p> <p><img src="https://os.phil-opp.com/set-up-rust/red-zone.svg" alt="stack frame with red zone" /></p> <p>The image shows the stack frame of a function with <code>n</code> local variables. On function entry, the stack pointer is adjusted to make room on the stack for the local variables.</p> <p>The red zone is defined as the 128 bytes below the adjusted stack pointer. The function can use this area for temporary data that’s not needed across function calls. Thus, the two instructions for adjusting the stack pointer can be avoided in some cases (e.g. in small leaf functions).</p> <p>However, this optimization leads to huge problems with exceptions or hardware interrupts. Let’s assume that an exception occurs while a function uses the red zone:</p> <p><img src="https://os.phil-opp.com/set-up-rust/red-zone-overwrite.svg" alt="red zone overwritten by exception handler" /></p> <p>The CPU and the exception handler overwrite the data in red zone. But this data is still needed by the interrupted function. So the function won’t work correctly anymore when we return from the exception handler. This might lead to strange bugs that <a href="https://forum.osdev.org/viewtopic.php?t=21720">take weeks to debug</a>.</p> <p>To avoid such bugs when we implement exception handling in the future, we disable the red zone right from the beginning. This is achieved by adding the <code>"disable-redzone": true</code> line to our target configuration file.</p> <h4 id="simd-extensions"><a class="zola-anchor" href="#simd-extensions" aria-label="Anchor link for: simd-extensions">🔗</a>SIMD Extensions</h4> <p>The <code>features</code> field enables/disables target features. We disable the <code>mmx</code> and <code>sse</code> features by prefixing them with a minus and enable the <code>soft-float</code> feature by prefixing it with a plus. The <code>mmx</code> and <code>sse</code> features determine support for <a href="https://en.wikipedia.org/wiki/SIMD">Single Instruction Multiple Data (SIMD)</a> instructions, which simultaneously perform an operation (e.g. addition) on multiple data words. The <code>x86</code> architecture supports the following standards:</p> <ul> <li><a href="https://en.wikipedia.org/wiki/MMX_(instruction_set)">MMX</a>: The <em>Multi Media Extension</em> instruction set was introduced in 1997 and defines eight 64 bit registers called <code>mm0</code> through <code>mm7</code>. These registers are just aliases for the registers of the <a href="https://en.wikipedia.org/wiki/X87">x87 floating point unit</a>.</li> <li><a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a>: The <em>Streaming SIMD Extensions</em> instruction set was introduced in 1999. Instead of re-using the floating point registers, it adds a completely new register set. The sixteen new registers are called <code>xmm0</code> through <code>xmm15</code> and are 128 bits each.</li> <li><a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions">AVX</a>: The <em>Advanced Vector Extensions</em> are extensions that further increase the size of the multimedia registers. The new registers are called <code>ymm0</code> through <code>ymm15</code> and are 256 bits each. They extend the <code>xmm</code> registers, so e.g. <code>xmm0</code> is the lower half of <code>ymm0</code>.</li> </ul> <p>By using such SIMD standards, programs can often speed up significantly. Good compilers are able to transform normal loops into such SIMD code automatically through a process called <a href="https://en.wikipedia.org/wiki/Automatic_vectorization">auto-vectorization</a>.</p> <p>However, the large SIMD registers lead to problems in OS kernels. The reason is that the kernel has to backup all registers that it uses on each hardware interrupt (we will look into this in the <a href="https://os.phil-opp.com/handling-exceptions/">“Handling Exceptions”</a> post). So if the kernel uses SIMD registers, it has to backup a lot more data, which noticeably decreases performance. To avoid this performance loss, we disable the <code>sse</code> and <code>mmx</code> features (the <code>avx</code> feature is disabled by default).</p> <p>As noted above, floating point operations on <code>x86_64</code> use SSE registers, so floats are no longer usable without SSE. Unfortunately, the Rust core library already uses floats (e.g., it implements traits for <code>f32</code> and <code>f64</code>), so we need an alternative way to implement float operations. The <code>soft-float</code> feature solves this problem by emulating all floating point operations through software functions based on normal integers.</p> <h3 id="compiling"><a class="zola-anchor" href="#compiling" aria-label="Anchor link for: compiling">🔗</a>Compiling</h3> <p>To build our kernel for our new target, we pass the configuration file’s name as <code>--target</code> argument. There is currently an <a href="https://github.com/rust-lang/cargo/issues/4905">open bug</a> for custom target specifications, so you also need to set the <code>RUST_TARGET_PATH</code> environment variable to the current directory, otherwise Rust doesn’t find your target. The full command is:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>RUST_TARGET_PATH=$(pwd) cargo build --target x86_64-blog_os </span></code></pre> <p>However, the following error occurs:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>error[E0463]: can&#39;t find crate for `core` </span><span> | </span><span> = note: the `x86_64-blog_os` target may not be installed </span></code></pre> <p>The error tells us that the Rust compiler no longer finds the core library. The <a href="https://doc.rust-lang.org/nightly/core/index.html">core library</a> is implicitly linked to all <code>no_std</code> crates and contains things such as <code>Result</code>, <code>Option</code>, and iterators.</p> <p>The problem is that the core library is distributed together with the Rust compiler as a <em>precompiled</em> library. So it is only valid for the host triple (e.g., <code>x86_64-unknown-linux-gnu</code>) but not for our custom target. If we want to compile code for other targets, we need to recompile <code>core</code> for these targets first.</p> <h4 id="xargo"><a class="zola-anchor" href="#xargo" aria-label="Anchor link for: xargo">🔗</a>Xargo</h4> <p>That’s where <a href="https://github.com/japaric/xargo">xargo</a> comes in. It is a wrapper for cargo that eases cross compilation. We can install it by executing:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>cargo install xargo </span></code></pre> <p>Xargo depends on the rust source code, which we can install with <code>rustup component add rust-src</code>.</p> <p>Xargo is “a drop-in replacement for cargo”, so every cargo command also works with <code>xargo</code>. You can do e.g. <code>xargo --help</code>, <code>xargo clean</code>, or <code>xargo doc</code>. However, the <code>build</code> command gains additional functionality: <code>xargo build</code> will automatically cross compile the <code>core</code> library when compiling for custom targets.</p> <p>Let’s try it:</p> <pre data-lang="bash" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-bash "><code class="language-bash" data-lang="bash"><span>&gt; RUST_TARGET_PATH=$(pwd) xargo build --target=x86_64-blog_os </span><span> Compiling core v0.0.0 (file:///…/rust/src/libcore) </span><span> Finished release </span><span style="color:#569cd6;">[</span><span>optimized</span><span style="color:#569cd6;">]</span><span> target(s) in 22.87 secs </span><span> Compiling blog_os v0.1.0 (file:///…/blog_os/tags) </span><span> Finished dev </span><span style="color:#569cd6;">[</span><span>unoptimized + debuginfo</span><span style="color:#569cd6;">]</span><span> target(s) in 0.29 secs </span></code></pre> <p>It worked! We see that <code>xargo</code> cross-compiled the <code>core</code> library for our new custom target and then continued to compile our <code>blog_os</code> crate. After compilation, we can find a static library at <code>target/x86_64-blog_os/debug/libblog_os.a</code>, which can be linked with our assembly kernel.</p> <h2 id="integrating-rust"><a class="zola-anchor" href="#integrating-rust" aria-label="Anchor link for: integrating-rust">🔗</a>Integrating Rust</h2> <p>Let’s try to integrate our Rust library into our assembly kernel so that we can call the <code>rust_main</code> function. For that we need to pass the <code>libblog_os.a</code> file to the linker, together with the assembly object files.</p> <h3 id="adjusting-the-makefile"><a class="zola-anchor" href="#adjusting-the-makefile" aria-label="Anchor link for: adjusting-the-makefile">🔗</a>Adjusting the Makefile</h3> <p>To build and link the rust library on <code>make</code>, we extend our <code>Makefile</code>(<a href="https://github.com/phil-opp/blog_os/blob/first_edition_post_3/Makefile">full file</a>):</p> <pre data-lang="make" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-make "><code class="language-make" data-lang="make"><span style="color:#608b4e;"># ... </span><span>target ?= </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">-blog_os</span><span> </span><span>rust_os := </span><span style="background-color:#282828;color:#d69d85;">target/</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">target</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">/debug/libblog_os.a</span><span> </span><span style="color:#608b4e;"># ... </span><span>.PHONY: </span><span style="background-color:#282828;color:#d69d85;">all clean run iso kernel</span><span> </span><span style="color:#608b4e;"># ... </span><span style="color:#569cd6;">$(</span><span>kernel</span><span style="color:#569cd6;">)</span><span>: </span><span style="background-color:#282828;color:#d69d85;">kernel </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">rust_os</span><span style="background-color:#282828;color:#569cd6;">) $(</span><span style="background-color:#282828;color:#dcdcdc;">assembly_object_files</span><span style="background-color:#282828;color:#569cd6;">) $(</span><span style="background-color:#282828;color:#dcdcdc;">linker_script</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span> </span><span style="color:#569cd6;">@</span><span>ld -n -T </span><span style="color:#569cd6;">$(</span><span>linker_script</span><span style="color:#569cd6;">)</span><span> -o </span><span style="color:#569cd6;">$(</span><span>kernel</span><span style="color:#569cd6;">) </span><span>\ </span><span> </span><span style="color:#569cd6;">$(</span><span>assembly_object_files</span><span style="color:#569cd6;">) $(</span><span>rust_os</span><span style="color:#569cd6;">) </span><span> </span><span>kernel: </span><span> </span><span style="color:#569cd6;">@</span><span>RUST_TARGET_PATH=</span><span style="color:#569cd6;">$(</span><span>shell pwd</span><span style="color:#569cd6;">) </span><span>xargo build --target </span><span style="color:#569cd6;">$(</span><span>target</span><span style="color:#569cd6;">) </span></code></pre> <p>We add a new <code>kernel</code> target that just executes <code>xargo build</code> and modify the <code>$(kernel)</code> target to link the created static lib. We also add the new <code>kernel</code> target to the <code>.PHONY</code> list, since it does not belong to a file with that name.</p> <p>But now <code>xargo build</code> is executed on every <code>make</code>, even if no source file was changed. And the ISO is recreated on every <code>make iso</code>/<code>make run</code>, too. We could try to avoid this by adding dependencies on all rust source and cargo configuration files to the <code>kernel</code> target, but the ISO creation takes only half a second on my machine and most of the time we will have changed a Rust file when we run <code>make</code>. So we keep it simple for now and let cargo do the bookkeeping of changed files (it does it anyway).</p> <h3 id="calling-rust"><a class="zola-anchor" href="#calling-rust" aria-label="Anchor link for: calling-rust">🔗</a>Calling Rust</h3> <p>Now we can call the main method in <code>long_mode_start</code>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>bits 64 </span><span>long_mode_start: </span><span> ... </span><span> </span><span style="color:#608b4e;"> ; call the rust main </span><span> extern rust_main</span><span style="color:#608b4e;"> ; new </span><span> </span><span style="color:#569cd6;">call </span><span>rust_main</span><span style="color:#608b4e;"> ; new </span><span> </span><span style="color:#608b4e;"> ; print `OKAY` to screen </span><span> </span><span style="color:#569cd6;">mov </span><span>rax, </span><span style="color:#b4cea8;">0x2f592f412f4b2f4f </span><span> </span><span style="color:#569cd6;">mov </span><span>qword [</span><span style="color:#b4cea8;">0xb8000</span><span>], rax </span><span> </span><span style="color:#569cd6;">hlt </span></code></pre> <p>By defining <code>rust_main</code> as <code>extern</code> we tell nasm that the function is defined in another file. As the linker takes care of linking them together, we’ll get a linker error if we have a typo in the name or forget to mark the rust function as <code>pub extern</code>.</p> <p>If we’ve done everything right, we should still see the green <code>OKAY</code> when executing <code>make run</code>. That means that we successfully called the Rust function and returned back to assembly.</p> <h3 id="fixing-linker-errors"><a class="zola-anchor" href="#fixing-linker-errors" aria-label="Anchor link for: fixing-linker-errors">🔗</a>Fixing Linker Errors</h3> <p>Now we can try some Rust code:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">pub extern fn </span><span>rust_main() { </span><span> </span><span style="color:#569cd6;">let</span><span> x = [</span><span style="color:#d69d85;">&quot;Hello&quot;</span><span>, </span><span style="color:#d69d85;">&quot;World&quot;</span><span>, </span><span style="color:#d69d85;">&quot;!&quot;</span><span>]; </span><span> </span><span style="color:#569cd6;">let</span><span> y = x; </span><span>} </span></code></pre> <p>When we test it using <code>make run</code>, it fails with <code>undefined reference to 'memcpy'</code>. The <code>memcpy</code> function is one of the basic functions of the C library (<code>libc</code>). Usually the <code>libc</code> crate is linked to every Rust program together with the standard library, but we opted out through <code>#![no_std]</code>. We could try to fix this by adding the <a href="https://doc.rust-lang.org/1.10.0/libc/index.html">libc crate</a> as <code>extern crate</code>. But <code>libc</code> is just a wrapper for the system <code>libc</code>, for example <code>glibc</code> on Linux, so this won’t work for us. Instead we need to recreate the basic <code>libc</code> functions such as <code>memcpy</code>, <code>memmove</code>, <code>memset</code>, and <code>memcmp</code> in Rust.</p> <h4 id="rlibc"><a class="zola-anchor" href="#rlibc" aria-label="Anchor link for: rlibc">🔗</a>rlibc</h4> <p>Fortunately there already is a crate for that: <a href="https://crates.io/crates/rlibc">rlibc</a>. When we look at its <a href="https://github.com/alexcrichton/rlibc/blob/defb486e765846417a8e73329e8c5196f1dca49a/src/lib.rs">source code</a> we see that it contains no magic, just some <a href="https://doc.rust-lang.org/book/raw-pointers.html">raw pointer</a> operations in a while loop. To add <code>rlibc</code> as a dependency we just need to add two lines to the <code>Cargo.toml</code>:</p> <pre data-lang="toml" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-toml "><code class="language-toml" data-lang="toml"><span style="color:#ff3333;">... </span><span>[</span><span style="color:#808080;">dependencies</span><span>] </span><span style="color:#569cd6;">rlibc </span><span>= </span><span style="color:#d69d85;">&quot;1.0&quot; </span></code></pre> <p>and an <code>extern crate</code> definition in our <code>src/lib.rs</code>:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#569cd6;">... </span><span style="color:#569cd6;">extern crate</span><span> rlibc; </span><span> </span><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>rust_main() { </span><span style="color:#569cd6;">... </span></code></pre> <p>Now <code>make run</code> doesn’t complain about <code>memcpy</code> anymore. Instead it will show a pile of new ugly linker errors:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>target/x86_64-blog_os/debug/libblog_os.a(core-92335f822fa6c9a6.0.o): </span><span> In function `_$LT$f32$u20$as$u20$core..num..dec2flt.. </span><span> rawfp..RawFloat$GT$::from_int::h50f7952efac3fdca&#39;: </span><span> core.cgu-0.rs:(.text._ZN59_$LT$f32$u20$as$u20$core..num..dec2flt.. </span><span> rawfp..RawFloat$GT$8from_int17h50f7952efac3fdcaE+0x2): </span><span> undefined reference to `__floatundisf&#39; </span><span>target/x86_64-blog_os/debug/libblog_os.a(core-92335f822fa6c9a6.0.o): </span><span> In function `_$LT$f64$u20$as$u20$core..num..dec2flt..rawfp.. </span><span> RawFloat$GT$::from_int::h12a81f175246914a&#39;: </span><span> core.cgu-0.rs:(.text._ZN59_$LT$f64$u20$as$u20$core..num..dec2flt..rawfp.. </span><span> RawFloat$GT$8from_int17h12a81f175246914aE+0x2): </span><span> undefined reference to `__floatundidf&#39; </span><span>target/x86_64-blog_os/debug/libblog_os.a(core-92335f822fa6c9a6.0.o): </span><span> In function `core::num::from_str_radix::h09b12650704e0508&#39;: </span><span> core.cgu-0.rs:(.text._ZN4core3num14from_str_radix </span><span> 17h09b12650704e0508E+0xcf): </span><span> undefined reference to `__muloti4&#39; </span><span>... </span></code></pre> <h4 id="gc-sections"><a class="zola-anchor" href="#gc-sections" aria-label="Anchor link for: gc-sections">🔗</a>–gc-sections</h4> <p>The new errors are linker errors about various missing functions such as <code>__floatundisf</code> or <code>__muloti4</code>. These functions are part of LLVM’s <a href="https://compiler-rt.llvm.org/"><code>compiler-rt</code> builtins</a> and are normally linked by the standard library. For <code>no_std</code> crates like ours, one has to link the <code>compiler-rt</code> library manually. Unfortunately, this library is implemented in C and the build process is a bit cumbersome. Alternatively, there is the <a href="https://github.com/rust-lang-nursery/compiler-builtins">compiler-builtins</a> crate that tries to port the library to Rust, but it isn’t complete yet.</p> <p>In our case, there is a much simpler solution, since our kernel doesn’t really need any of those functions yet. So we can just tell the linker to remove unused program sections and hopefully all references to these functions will disappear. Removing unused sections is generally a good idea as it reduces kernel size. The magic linker flag for this is <code>--gc-sections</code>, which stands for “garbage collect sections”. Let’s add it to the <code>$(kernel)</code> target in our <code>Makefile</code>:</p> <pre data-lang="make" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-make "><code class="language-make" data-lang="make"><span style="color:#569cd6;">$(</span><span>kernel</span><span style="color:#569cd6;">)</span><span>: </span><span style="background-color:#282828;color:#d69d85;">xargo </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">rust_os</span><span style="background-color:#282828;color:#569cd6;">) $(</span><span style="background-color:#282828;color:#dcdcdc;">assembly_object_files</span><span style="background-color:#282828;color:#569cd6;">) $(</span><span style="background-color:#282828;color:#dcdcdc;">linker_script</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span> </span><span style="color:#569cd6;">@</span><span>ld -n --gc-sections -T </span><span style="color:#569cd6;">$(</span><span>linker_script</span><span style="color:#569cd6;">)</span><span> -o </span><span style="color:#569cd6;">$(</span><span>kernel</span><span style="color:#569cd6;">) </span><span>\ </span><span> </span><span style="color:#569cd6;">$(</span><span>assembly_object_files</span><span style="color:#569cd6;">) $(</span><span>rust_os</span><span style="color:#569cd6;">) </span></code></pre> <p>Now we can do a <code>make run</code> again and it compiles without errors again. However, it doesn’t boot anymore:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>GRUB error: no multiboot header found. </span></code></pre> <p>What happened? Well, the linker removed unused sections. And since we don’t use the Multiboot section anywhere, <code>ld</code> removes it, too. So we need to tell the linker explicitly that it should keep this section. The <code>KEEP</code> command does exactly that, so we add it to the linker script (<code>linker.ld</code>):</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>.boot : </span><span>{ </span><span> /* ensure that the multiboot header is at the beginning */ </span><span> KEEP(*(.multiboot_header)) </span><span>} </span></code></pre> <p>Now everything should work again (the green <code>OKAY</code>). But there is another linking issue, which is triggered by some other example code.</p> <h4 id="panic-abort"><a class="zola-anchor" href="#panic-abort" aria-label="Anchor link for: panic-abort">🔗</a>panic = “abort”</h4> <p>The following snippet still fails:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span> </span><span style="color:#569cd6;">... </span><span> </span><span style="color:#569cd6;">let</span><span> test = (</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span style="color:#b5cea8;">3</span><span>).flat_map(|x| </span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span>x).zip(</span><span style="color:#b5cea8;">0</span><span style="color:#569cd6;">..</span><span>); </span></code></pre> <p>The error is a linker error again (hence the ugly error message):</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>target/x86_64-blog_os/debug/libblog_os.a(blog_os-b5a29f28b14f1f1f.0.o): </span><span> In function `core::ptr::drop_in_place&lt;core::iter::Zip&lt; </span><span> core::iter::FlatMap&lt;core::ops::Range&lt;i32&gt;, core::ops::Range&lt;i32&gt;, </span><span> closure&gt;, core::ops::RangeFrom&lt;i32&gt;&gt;&gt;&#39;: </span><span> /…/rust/src/libcore/ptr.rs:66: </span><span> undefined reference to `_Unwind_Resume&#39; </span><span>target/x86_64-blog_os/debug/libblog_os.a(blog_os-b5a29f28b14f1f1f.0.o): </span><span> In function `core::iter::iterator::Iterator::zip&lt;core::iter::FlatMap&lt; </span><span> core::ops::Range&lt;i32&gt;, core::ops::Range&lt;i32&gt;, closure&gt;, </span><span> core::ops::RangeFrom&lt;i32&gt;&gt;&#39;: </span><span> /…/rust/src/libcore/iter/iterator.rs:389: </span><span> undefined reference to `_Unwind_Resume&#39; </span><span>... </span></code></pre> <p>So the linker can’t find a function named <code>_Unwind_Resume</code> that is referenced e.g. in <code>iter/iterator.rs:389</code> in libcore. This reference is not really there at <a href="https://github.com/rust-lang/rust/blob/c58c928e658d2e45f816fd05796a964aa83759da/src/libcore/iter/iterator.rs#L389">line 389</a> of libcore’s <code>iterator.rs</code>. Instead, it is a compiler inserted <em>landing pad</em>, which is used for panic handling.</p> <p>By default, the destructors of all stack variables are run when a <code>panic</code> occurs. This is called <em>unwinding</em> and allows parent threads to recover from panics. However, it requires a platform specific gcc library, which isn’t available in our kernel.</p> <p>Fortunately, Rust allows us to disable unwinding for our target. For that we add the following line to our <code>x86_64-blog_os.json</code> file:</p> <pre data-lang="json" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-json "><code class="language-json" data-lang="json"><span>{ </span><span> </span><span style="color:#d69d85;">&quot;...&quot;</span><span style="color:#ff3333;">, </span><span> </span><span style="color:#d69d85;">&quot;panic-strategy&quot;</span><span>: </span><span style="color:#d69d85;">&quot;abort&quot; </span><span>} </span><span> </span></code></pre> <p>By setting the <a href="https://github.com/nox/rust-rfcs/blob/master/text/1513-less-unwinding.md">panic strategy</a> to <code>abort</code> instead of the default <code>unwind</code>, we disable all unwinding in our kernel. Let’s try <code>make run</code> again:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span> Compiling core v0.0.0 (file:///…/rust/src/libcore) </span><span> Finished release [optimized] target(s) in 22.24 secs </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.5 secs </span><span>target/x86_64-blog_os/debug/libblog_os.a(blog_os-b5a29f28b14f1f1f.0.o): </span><span> In function `core::ptr::drop_in_place&lt;…&gt;&#39;: </span><span> /…/src/libcore/ptr.rs:66: </span><span> undefined reference to `_Unwind_Resume&#39; </span><span>... </span></code></pre> <p>We see that <code>xargo</code> recompiles the <code>core</code> crate, but the <code>_Unwind_Resume</code> error still occurs. This is because our <code>blog_os</code> crate was not recompiled somehow and thus still references the unwinding function. To fix this, we need to force a recompile using <code>cargo clean</code>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; cargo clean </span><span>&gt; make run </span><span> Compiling rlibc v1.0.0 </span><span> Compiling blog_os v0.1.0 (file:///home/philipp/Documents/blog_os/tags) </span><span>warning: unused variable: `test` […] </span><span> </span><span> Finished dev [unoptimized + debuginfo] target(s) in 0.60 secs </span></code></pre> <p>It worked! We no longer see linker errors and our kernel prints <code>OKAY</code> again.</p> <h2 id="hello-world"><a class="zola-anchor" href="#hello-world" aria-label="Anchor link for: hello-world">🔗</a>Hello World!</h2> <p>Finally, it’s time for a <code>Hello World!</code> from Rust:</p> <pre data-lang="rust" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-rust "><code class="language-rust" data-lang="rust"><span>#[no_mangle] </span><span style="color:#569cd6;">pub extern fn </span><span>rust_main() { </span><span> </span><span style="color:#608b4e;">// ATTENTION: we have a very small stack and no guard page </span><span> </span><span> </span><span style="color:#569cd6;">let</span><span> hello = </span><span style="color:#569cd6;">b</span><span style="color:#d69d85;">&quot;Hello World!&quot;</span><span>; </span><span> </span><span style="color:#569cd6;">let</span><span> color_byte = </span><span style="color:#b5cea8;">0x1f</span><span>; </span><span style="color:#608b4e;">// white foreground, blue background </span><span> </span><span> </span><span style="color:#569cd6;">let mut</span><span> hello_colored = [color_byte; </span><span style="color:#b5cea8;">24</span><span>]; </span><span> </span><span style="color:#569cd6;">for </span><span>(i, char_byte) </span><span style="color:#569cd6;">in</span><span> hello.into_iter().enumerate() { </span><span> hello_colored[i*</span><span style="color:#b5cea8;">2</span><span>] = *char_byte; </span><span> } </span><span> </span><span> </span><span style="color:#608b4e;">// write `Hello World!` to the center of the VGA text buffer </span><span> </span><span style="color:#569cd6;">let</span><span> buffer_ptr = (</span><span style="color:#b5cea8;">0xb8000 </span><span>+ </span><span style="color:#b5cea8;">1988</span><span>) </span><span style="color:#569cd6;">as *mut _</span><span>; </span><span> </span><span style="color:#569cd6;">unsafe </span><span>{ *buffer_ptr = hello_colored }; </span><span> </span><span> </span><span style="color:#569cd6;">loop</span><span>{} </span><span>} </span></code></pre> <p>Some notes:</p> <ul> <li>The <code>b</code> prefix creates a <a href="https://doc.rust-lang.org/reference/tokens.html#characters-and-strings">byte string</a>, which is just an array of <code>u8</code></li> <li><a href="https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.enumerate">enumerate</a> is an <code>Iterator</code> method that adds the current index <code>i</code> to elements</li> <li><code>buffer_ptr</code> is a <a href="https://doc.rust-lang.org/book/raw-pointers.html">raw pointer</a> that points to the center of the VGA text buffer</li> <li>Rust doesn’t know the VGA buffer and thus can’t guarantee that writing to the <code>buffer_ptr</code> is safe (it could point to important data). So we need to tell Rust that we know what we are doing by using an <a href="https://doc.rust-lang.org/book/unsafe.html">unsafe block</a>.</li> </ul> <h3 id="stack-overflows"><a class="zola-anchor" href="#stack-overflows" aria-label="Anchor link for: stack-overflows">🔗</a>Stack Overflows</h3> <p>Since we still use the small 64 byte <a href="https://os.phil-opp.com/entering-longmode/#creating-a-stack">stack from the last post</a>, we must be careful not to <a href="https://en.wikipedia.org/wiki/Stack_overflow">overflow</a> it. Normally, Rust tries to avoid stack overflows through <em>guard pages</em>: The page below the stack isn’t mapped and such a stack overflow triggers a page fault (instead of silently overwriting random memory). But we can’t unmap the page below our stack right now since we currently use only a single big page. Fortunately the stack is located just above the page tables. So some important page table entry would probably get overwritten on stack overflow and then a page fault occurs, too.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>Until now we write magic bits to some memory location when we want to print something to screen. In the <a href="https://os.phil-opp.com/printing-to-screen/">next post</a> we create a abstraction for the VGA text buffer that allows us to print strings in different colors and provides a simple interface.</p> Entering Long Mode Tue, 25 Aug 2015 00:00:00 +0000 https://os.phil-opp.com/entering-longmode/ https://os.phil-opp.com/entering-longmode/ <p>In the <a href="https://os.phil-opp.com/multiboot-kernel/">previous post</a> we created a minimal multiboot kernel. It just prints <code>OK</code> and hangs. The goal is to extend it and call 64-bit <a href="https://www.rust-lang.org/">Rust</a> code. But the CPU is currently in <a href="https://en.wikipedia.org/wiki/Protected_mode">protected mode</a> and allows only 32-bit instructions and up to 4GiB memory. So we need to set up <em>Paging</em> and switch to the 64-bit <a href="https://en.wikipedia.org/wiki/Long_mode">long mode</a> first.</p> <span id="continue-reading"></span> <p>I tried to explain everything in detail and to keep the code as simple as possible. If you have any questions, suggestions, or issues, please leave a comment or <a href="https://github.com/phil-opp/blog_os/issues">create an issue</a> on Github. The source code is available in a <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_2/src/arch/x86_64">repository</a>, too.</p> <h2 id="some-tests"><a class="zola-anchor" href="#some-tests" aria-label="Anchor link for: some-tests">🔗</a>Some Tests</h2> <p>To avoid bugs and strange errors on old CPUs we should check if the processor supports every needed feature. If not, the kernel should abort and display an error message. To handle errors easily, we create an error procedure in <code>boot.asm</code>. It prints a rudimentary <code>ERR: X</code> message, where X is an error code letter, and hangs:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span style="color:#608b4e;">; Prints `ERR: ` and the given error code to screen and hangs. </span><span style="color:#608b4e;">; parameter: error code (in ascii) in al </span><span>error: </span><span> </span><span style="color:#569cd6;">mov </span><span>dword [</span><span style="color:#b4cea8;">0xb8000</span><span>], </span><span style="color:#b4cea8;">0x4f524f45 </span><span> </span><span style="color:#569cd6;">mov </span><span>dword [</span><span style="color:#b4cea8;">0xb8004</span><span>], </span><span style="color:#b4cea8;">0x4f3a4f52 </span><span> </span><span style="color:#569cd6;">mov </span><span>dword [</span><span style="color:#b4cea8;">0xb8008</span><span>], </span><span style="color:#b4cea8;">0x4f204f20 </span><span> </span><span style="color:#569cd6;">mov </span><span>byte [</span><span style="color:#b4cea8;">0xb800a</span><span>], al </span><span> </span><span style="color:#569cd6;">hlt </span></code></pre> <p>At address <code>0xb8000</code> begins the so-called <a href="https://en.wikipedia.org/wiki/VGA-compatible_text_mode">VGA text buffer</a>. It’s an array of screen characters that are displayed by the graphics card. A <a href="https://os.phil-opp.com/printing-to-screen/">future post</a> will cover the VGA buffer in detail and create a Rust interface to it. But for now, manual bit-fiddling is the easiest option.</p> <p>A screen character consists of a 8 bit color code and a 8 bit <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> character. We used the color code <code>4f</code> for all characters, which means white text on red background. <code>0x52</code> is an ASCII <code>R</code>, <code>0x45</code> is an <code>E</code>, <code>0x3a</code> is a <code>:</code>, and <code>0x20</code> is a space. The second space is overwritten by the given ASCII byte. Finally the CPU is stopped with the <code>hlt</code> instruction.</p> <p>Now we can add some check <em>functions</em>. A function is just a normal label with an <code>ret</code> (return) instruction at the end. The <code>call</code> instruction can be used to call it. Unlike the <code>jmp</code> instruction that just jumps to a memory address, the <code>call</code> instruction will push a return address to the stack (and the <code>ret</code> will jump to this address). But we don’t have a stack yet. The <a href="https://stackoverflow.com/a/1464052/866447">stack pointer</a> in the esp register could point to some important data or even invalid memory. So we need to update it and point it to some valid stack memory.</p> <h3 id="creating-a-stack"><a class="zola-anchor" href="#creating-a-stack" aria-label="Anchor link for: creating-a-stack">🔗</a>Creating a Stack</h3> <p>To create stack memory we reserve some bytes at the end of our <code>boot.asm</code>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>... </span><span>section .bss </span><span>stack_bottom: </span><span> resb </span><span style="color:#b4cea8;">64 </span><span>stack_top: </span></code></pre> <p>A stack doesn’t need to be initialized because we will <code>pop</code> only when we <code>pushed</code> before. So storing the stack memory in the executable file would make it unnecessary large. By using the <a href="https://en.wikipedia.org/wiki/.bss">.bss</a> section and the <code>resb</code> (reserve byte) command, we just store the length of the uninitialized data (= 64). When loading the executable, GRUB will create the section of required size in memory.</p> <p>To use the new stack, we update the stack pointer register right after <code>start</code>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>global start </span><span> </span><span>section .text </span><span>bits 32 </span><span>start: </span><span> </span><span style="color:#569cd6;">mov </span><span>esp, stack_top </span><span> </span><span style="color:#608b4e;"> ; print `OK` to screen </span><span> ... </span></code></pre> <p>We use <code>stack_top</code> because the stack grows downwards: A <code>push eax</code> subtracts 4 from <code>esp</code> and does a <code>mov [esp], eax</code> afterwards (<code>eax</code> is a general purpose register).</p> <p>Now we have a valid stack pointer and are able to call functions. The following check functions are just here for completeness and I won’t explain details. Basically they all work the same: They will check for a feature and jump to <code>error</code> if it’s not available.</p> <h3 id="multiboot-check"><a class="zola-anchor" href="#multiboot-check" aria-label="Anchor link for: multiboot-check">🔗</a>Multiboot check</h3> <p>We rely on some Multiboot features in the next posts. To make sure the kernel was really loaded by a Multiboot compliant bootloader, we can check the <code>eax</code> register. According to the Multiboot specification (<a href="https://nongnu.askapache.com/grub/phcoder/multiboot.pdf">PDF</a>), the bootloader must write the magic value <code>0x36d76289</code> to it before loading a kernel. To verify that we can add a simple function:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>check_multiboot: </span><span> </span><span style="color:#569cd6;">cmp </span><span>eax, </span><span style="color:#b4cea8;">0x36d76289 </span><span> </span><span style="color:#569cd6;">jne </span><span>.no_multiboot </span><span> </span><span style="color:#569cd6;">ret </span><span>.no_multiboot: </span><span> </span><span style="color:#569cd6;">mov </span><span>al, </span><span style="color:#d69d85;">&quot;0&quot; </span><span> </span><span style="color:#569cd6;">jmp </span><span>error </span></code></pre> <p>We use the <code>cmp</code> instruction to compare the value in <code>eax</code> to the magic value. If the values are equal, the <code>cmp</code> instruction sets the zero flag in the <a href="https://en.wikipedia.org/wiki/FLAGS_register">FLAGS register</a>. The <code>jne</code> (“jump if not equal”) instruction reads this zero flag and jumps to the given address if it’s not set. Thus we jump to the <code>.no_multiboot</code> label if <code>eax</code> does not contain the magic value.</p> <p>In <code>no_multiboot</code>, we use the <code>jmp</code> (“jump”) instruction to jump to our error function. We could just as well use the <code>call</code> instruction, which additionally pushes the return address. But the return address is not needed because <code>error</code> never returns. To pass <code>0</code> as error code to the <code>error</code> function, we move it into <code>al</code> before the jump (<code>error</code> will read it from there).</p> <h3 id="cpuid-check"><a class="zola-anchor" href="#cpuid-check" aria-label="Anchor link for: cpuid-check">🔗</a>CPUID check</h3> <p><a href="https://wiki.osdev.org/CPUID">CPUID</a> is a CPU instruction that can be used to get various information about the CPU. But not every processor supports it. CPUID detection is quite laborious, so we just copy a detection function from the <a href="https://wiki.osdev.org/Setting_Up_Long_Mode#Detection_of_CPUID">OSDev wiki</a>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>check_cpuid: </span><span style="color:#608b4e;"> ; Check if CPUID is supported by attempting to flip the ID bit (bit 21) </span><span style="color:#608b4e;"> ; in the FLAGS register. If we can flip it, CPUID is available. </span><span> </span><span style="color:#608b4e;"> ; Copy FLAGS in to EAX via stack </span><span> </span><span style="color:#569cd6;">pushfd </span><span> </span><span style="color:#569cd6;">pop </span><span>eax </span><span> </span><span style="color:#608b4e;"> ; Copy to ECX as well for comparing later on </span><span> </span><span style="color:#569cd6;">mov </span><span>ecx, eax </span><span> </span><span style="color:#608b4e;"> ; Flip the ID bit </span><span> </span><span style="color:#569cd6;">xor </span><span>eax, </span><span style="color:#b4cea8;">1 </span><span>&lt;&lt; </span><span style="color:#b4cea8;">21 </span><span> </span><span style="color:#608b4e;"> ; Copy EAX to FLAGS via the stack </span><span> </span><span style="color:#569cd6;">push </span><span>eax </span><span> </span><span style="color:#569cd6;">popfd </span><span> </span><span style="color:#608b4e;"> ; Copy FLAGS back to EAX (with the flipped bit if CPUID is supported) </span><span> </span><span style="color:#569cd6;">pushfd </span><span> </span><span style="color:#569cd6;">pop </span><span>eax </span><span> </span><span style="color:#608b4e;"> ; Restore FLAGS from the old version stored in ECX (i.e. flipping the </span><span style="color:#608b4e;"> ; ID bit back if it was ever flipped). </span><span> </span><span style="color:#569cd6;">push </span><span>ecx </span><span> </span><span style="color:#569cd6;">popfd </span><span> </span><span style="color:#608b4e;"> ; Compare EAX and ECX. If they are equal then that means the bit </span><span style="color:#608b4e;"> ; wasn&#39;t flipped, and CPUID isn&#39;t supported. </span><span> </span><span style="color:#569cd6;">cmp </span><span>eax, ecx </span><span> </span><span style="color:#569cd6;">je </span><span>.no_cpuid </span><span> </span><span style="color:#569cd6;">ret </span><span>.no_cpuid: </span><span> </span><span style="color:#569cd6;">mov </span><span>al, </span><span style="color:#d69d85;">&quot;1&quot; </span><span> </span><span style="color:#569cd6;">jmp </span><span>error </span></code></pre> <p>Basically, the <code>CPUID</code> instruction is supported if we can flip some bit in the <a href="https://en.wikipedia.org/wiki/FLAGS_register">FLAGS register</a>. We can’t operate on the flags register directly, so we need to load it into some general purpose register such as <code>eax</code> first. The only way to do this is to push the <code>FLAGS</code> register on the stack through the <code>pushfd</code> instruction and then pop it into <code>eax</code>. Equally, we write it back through <code>push ecx</code> and <code>popfd</code>. To flip the bit we use the <code>xor</code> instruction to perform an <a href="https://en.wikipedia.org/wiki/Exclusive_or">exclusive OR</a>. Finally we compare the two values and jump to <code>.no_cpuid</code> if both are equal (<code>je</code> – “jump if equal”). The <code>.no_cpuid</code> code just jumps to the <code>error</code> function with error code <code>1</code>.</p> <p>Don’t worry, you don’t need to understand the details.</p> <h3 id="long-mode-check"><a class="zola-anchor" href="#long-mode-check" aria-label="Anchor link for: long-mode-check">🔗</a>Long Mode check</h3> <p>Now we can use CPUID to detect whether long mode can be used. I use code from <a href="https://wiki.osdev.org/Setting_Up_Long_Mode#x86_or_x86-64">OSDev</a> again:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>check_long_mode: </span><span style="color:#608b4e;"> ; test if extended processor info in available </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, </span><span style="color:#b4cea8;">0x80000000</span><span style="color:#608b4e;"> ; implicit argument for cpuid </span><span> </span><span style="color:#569cd6;">cpuid</span><span style="color:#608b4e;"> ; get highest supported argument </span><span> </span><span style="color:#569cd6;">cmp </span><span>eax, </span><span style="color:#b4cea8;">0x80000001</span><span style="color:#608b4e;"> ; it needs to be at least 0x80000001 </span><span> </span><span style="color:#569cd6;">jb </span><span>.no_long_mode</span><span style="color:#608b4e;"> ; if it&#39;s less, the CPU is too old for long mode </span><span> </span><span style="color:#608b4e;"> ; use extended info to test if long mode is available </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, </span><span style="color:#b4cea8;">0x80000001</span><span style="color:#608b4e;"> ; argument for extended processor info </span><span> </span><span style="color:#569cd6;">cpuid</span><span style="color:#608b4e;"> ; returns various feature bits in ecx and edx </span><span> </span><span style="color:#569cd6;">test </span><span>edx, </span><span style="color:#b4cea8;">1 </span><span>&lt;&lt; </span><span style="color:#b4cea8;">29</span><span style="color:#608b4e;"> ; test if the LM-bit is set in the D-register </span><span> </span><span style="color:#569cd6;">jz </span><span>.no_long_mode</span><span style="color:#608b4e;"> ; If it&#39;s not set, there is no long mode </span><span> </span><span style="color:#569cd6;">ret </span><span>.no_long_mode: </span><span> </span><span style="color:#569cd6;">mov </span><span>al, </span><span style="color:#d69d85;">&quot;2&quot; </span><span> </span><span style="color:#569cd6;">jmp </span><span>error </span></code></pre> <p>Like many low-level things, CPUID is a bit strange. Instead of taking a parameter, the <code>cpuid</code> instruction implicitly uses the <code>eax</code> register as argument. To test if long mode is available, we need to call <code>cpuid</code> with <code>0x80000001</code> in <code>eax</code>. This loads some information to the <code>ecx</code> and <code>edx</code> registers. Long mode is supported if the 29th bit in <code>edx</code> is set. <a href="https://en.wikipedia.org/wiki/CPUID#EAX.3D80000001h:_Extended_Processor_Info_and_Feature_Bits">Wikipedia</a> has detailed information.</p> <p>If you look at the assembly above, you’ll probably notice that we call <code>cpuid</code> twice. The reason is that the CPUID command started with only a few functions and was extended over time. So old processors may not know the <code>0x80000001</code> argument at all. To test if they do, we need to invoke <code>cpuid</code> with <code>0x80000000</code> in <code>eax</code> first. It returns the highest supported parameter value in <code>eax</code>. If it’s at least <code>0x80000001</code>, we can test for long mode as described above. Else the CPU is old and doesn’t know what long mode is either. In that case, we directly jump to <code>.no_long_mode</code> through the <code>jb</code> instruction (“jump if below”).</p> <h3 id="putting-it-together"><a class="zola-anchor" href="#putting-it-together" aria-label="Anchor link for: putting-it-together">🔗</a>Putting it together</h3> <p>We just call these check functions right after start:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>global start </span><span> </span><span>section .text </span><span>bits 32 </span><span>start: </span><span> </span><span style="color:#569cd6;">mov </span><span>esp, stack_top </span><span> </span><span> </span><span style="color:#569cd6;">call </span><span>check_multiboot </span><span> </span><span style="color:#569cd6;">call </span><span>check_cpuid </span><span> </span><span style="color:#569cd6;">call </span><span>check_long_mode </span><span> </span><span style="color:#608b4e;"> ; print `OK` to screen </span><span> ... </span></code></pre> <p>When the CPU doesn’t support a needed feature, we get an error message with an unique error code. Now we can start the real work.</p> <h2 id="paging"><a class="zola-anchor" href="#paging" aria-label="Anchor link for: paging">🔗</a>Paging</h2> <p><em>Paging</em> is a memory management scheme that separates virtual and physical memory. The address space is split into equal sized <em>pages</em> and a <em>page table</em> specifies which virtual page points to which physical page. If you never heard of paging, you might want to look at the paging introduction (<a href="http://pages.cs.wisc.edu/~remzi/OSTEP/vm-paging.pdf">PDF</a>) of the <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/">Three Easy Pieces</a> OS book.</p> <p>In long mode, x86 uses a page size of 4096 bytes and a 4 level page table that consists of:</p> <ul> <li>the Page-Map Level-4 Table (PML4),</li> <li>the Page-Directory Pointer Table (PDP),</li> <li>the Page-Directory Table (PD),</li> <li>and the Page Table (PT).</li> </ul> <p>As I don’t like these names, I will call them P4, P3, P2, and P1 from now on.</p> <p>Each page table contains 512 entries and one entry is 8 bytes, so they fit exactly in one page (<code>512*8 = 4096</code>). To translate a virtual address to a physical address the CPU<sup class="footnote-reference"><a href="#hardware_lookup">1</a></sup> will do the following<sup class="footnote-reference"><a href="#virtual_physical_translation_source">2</a></sup>:</p> <p><img src="https://os.phil-opp.com/entering-longmode/X86_Paging_64bit.svg" alt="translation of virtual to physical addresses in 64 bit mode" /></p> <ol> <li>Get the address of the P4 table from the CR3 register</li> <li>Use bits 39-47 (9 bits) as an index into P4 (<code>2^9 = 512 = number of entries</code>)</li> <li>Use the following 9 bits as an index into P3</li> <li>Use the following 9 bits as an index into P2</li> <li>Use the following 9 bits as an index into P1</li> <li>Use the last 12 bits as page offset (<code>2^12 = 4096 = page size</code>)</li> </ol> <p>But what happens to bits 48-63 of the 64-bit virtual address? Well, they can’t be used. The “64-bit” long mode is in fact just a 48-bit mode. The bits 48-63 must be copies of bit 47, so each valid virtual address is still unique. For more information see <a href="https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details">Wikipedia</a>.</p> <p>An entry in the P4, P3, P2, and P1 tables consists of the page aligned 52-bit <em>physical</em> address of the frame or the next page table and the following bits that can be OR-ed in:</p> <table><thead><tr><th>Bit(s)</th><th>Name</th><th>Meaning</th></tr></thead><tbody> <tr><td>0</td><td>present</td><td>the page is currently in memory</td></tr> <tr><td>1</td><td>writable</td><td>it’s allowed to write to this page</td></tr> <tr><td>2</td><td>user accessible</td><td>if not set, only kernel mode code can access this page</td></tr> <tr><td>3</td><td>write through caching</td><td>writes go directly to memory</td></tr> <tr><td>4</td><td>disable cache</td><td>no cache is used for this page</td></tr> <tr><td>5</td><td>accessed</td><td>the CPU sets this bit when this page is used</td></tr> <tr><td>6</td><td>dirty</td><td>the CPU sets this bit when a write to this page occurs</td></tr> <tr><td>7</td><td>huge page/null</td><td>must be 0 in P1 and P4, creates a 1GiB page in P3, creates a 2MiB page in P2</td></tr> <tr><td>8</td><td>global</td><td>page isn’t flushed from caches on address space switch (PGE bit of CR4 register must be set)</td></tr> <tr><td>9-11</td><td>available</td><td>can be used freely by the OS</td></tr> <tr><td>52-62</td><td>available</td><td>can be used freely by the OS</td></tr> <tr><td>63</td><td>no execute</td><td>forbid executing code on this page (the NXE bit in the EFER register must be set)</td></tr> </tbody></table> <h3 id="set-up-identity-paging"><a class="zola-anchor" href="#set-up-identity-paging" aria-label="Anchor link for: set-up-identity-paging">🔗</a>Set Up Identity Paging</h3> <p>When we switch to long mode, paging will be activated automatically. The CPU will then try to read the instruction at the following address, but this address is now a virtual address. So we need to do <em>identity mapping</em>, i.e. map a physical address to the same virtual address.</p> <p>The <code>huge page</code> bit is now very useful to us. It creates a 2MiB (when used in P2) or even a 1GiB page (when used in P3). So we could map the first <em>gigabytes</em> of the kernel with only one P4 and one P3 table by using 1GiB pages. Unfortunately 1GiB pages are relatively new feature, for example Intel introduced it 2010 in the <a href="https://en.wikipedia.org/wiki/Westmere_(microarchitecture)#Technology">Westmere architecture</a>. Therefore we will use 2MiB pages instead to make our kernel compatible to older computers, too.</p> <p>To identity map the first gigabyte of our kernel with 512 2MiB pages, we need one P4, one P3, and one P2 table. Of course we will replace them with finer-grained tables later. But now that we’re stuck with assembly, we choose the easiest way.</p> <p>We can add these two tables at the beginning<sup class="footnote-reference"><a href="#page_table_alignment">3</a></sup> of the <code>.bss</code> section:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>... </span><span> </span><span>section .bss </span><span>align </span><span style="color:#b4cea8;">4096 </span><span>p4_table: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>p3_table: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>p2_table: </span><span> resb </span><span style="color:#b4cea8;">4096 </span><span>stack_bottom: </span><span> resb </span><span style="color:#b4cea8;">64 </span><span>stack_top: </span></code></pre> <p>The <code>resb</code> command reserves the specified amount of bytes without initializing them, so the 8KiB don’t need to be saved in the executable. The <code>align 4096</code> ensures that the page tables are page aligned.</p> <p>When GRUB creates the <code>.bss</code> section in memory, it will initialize it to <code>0</code>. So the <code>p4_table</code> is already valid (it contains 512 non-present entries) but not very useful. To be able to map 2MiB pages, we need to link P4’s first entry to the <code>p3_table</code> and P3’s first entry to the the <code>p2_table</code>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>set_up_page_tables: </span><span style="color:#608b4e;"> ; map first P4 entry to P3 table </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, p3_table </span><span> </span><span style="color:#569cd6;">or </span><span>eax, 0b11</span><span style="color:#608b4e;"> ; present + writable </span><span> </span><span style="color:#569cd6;">mov </span><span>[p4_table], eax </span><span> </span><span style="color:#608b4e;"> ; map first P3 entry to P2 table </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, p2_table </span><span> </span><span style="color:#569cd6;">or </span><span>eax, 0b11</span><span style="color:#608b4e;"> ; present + writable </span><span> </span><span style="color:#569cd6;">mov </span><span>[p3_table], eax </span><span> </span><span style="color:#608b4e;"> ; TODO map each P2 entry to a huge 2MiB page </span><span> </span><span style="color:#569cd6;">ret </span></code></pre> <p>We just set the present and writable bits (<code>0b11</code> is a binary number) in the aligned P3 table address and move it to the first 4 bytes of the P4 table. Then we do the same to link the first P3 entry to the <code>p2_table</code>.</p> <p>Now we need to map P2’s first entry to a huge page starting at 0, P2’s second entry to a huge page starting at 2MiB, P2’s third entry to a huge page starting at 4MiB, and so on. It’s time for our first (and only) assembly loop:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>set_up_page_tables: </span><span> ... </span><span style="color:#608b4e;"> ; map each P2 entry to a huge 2MiB page </span><span> </span><span style="color:#569cd6;">mov </span><span>ecx, </span><span style="color:#b4cea8;">0</span><span style="color:#608b4e;"> ; counter variable </span><span> </span><span>.map_p2_table: </span><span style="color:#608b4e;"> ; map ecx-th P2 entry to a huge page that starts at address 2MiB*ecx </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, </span><span style="color:#b4cea8;">0x200000</span><span style="color:#608b4e;"> ; 2MiB </span><span> </span><span style="color:#569cd6;">mul </span><span>ecx</span><span style="color:#608b4e;"> ; start address of ecx-th page </span><span> </span><span style="color:#569cd6;">or </span><span>eax, 0b10000011</span><span style="color:#608b4e;"> ; present + writable + huge </span><span> </span><span style="color:#569cd6;">mov </span><span>[p2_table + ecx * </span><span style="color:#b4cea8;">8</span><span>], eax</span><span style="color:#608b4e;"> ; map ecx-th entry </span><span> </span><span> </span><span style="color:#569cd6;">inc </span><span>ecx</span><span style="color:#608b4e;"> ; increase counter </span><span> </span><span style="color:#569cd6;">cmp </span><span>ecx, </span><span style="color:#b4cea8;">512</span><span style="color:#608b4e;"> ; if counter == 512, the whole P2 table is mapped </span><span> </span><span style="color:#569cd6;">jne </span><span>.map_p2_table</span><span style="color:#608b4e;"> ; else map the next entry </span><span> </span><span> </span><span style="color:#569cd6;">ret </span></code></pre> <p>Maybe I should first explain how an assembly loop works. We use the <code>ecx</code> register as a counter variable, just like <code>i</code> in a for loop. After mapping the <code>ecx-th</code> entry, we increase <code>ecx</code> by one and jump to <code>.map_p2_table</code> again if it’s still smaller than 512.</p> <p>To map a P2 entry we first calculate the start address of its page in <code>eax</code>: The <code>ecx-th</code> entry needs to be mapped to <code>ecx * 2MiB</code>. We use the <code>mul</code> operation for that, which multiplies <code>eax</code> with the given register and stores the result in <code>eax</code>. Then we set the <code>present</code>, <code>writable</code>, and <code>huge page</code> bits and write it to the P2 entry. The address of the <code>ecx-th</code> entry in P2 is <code>p2_table + ecx * 8</code>, because each entry is 8 bytes large.</p> <p>Now the first gigabyte (512 * 2MiB) of our kernel is identity mapped and thus accessible through the same physical and virtual addresses.</p> <h3 id="enable-paging"><a class="zola-anchor" href="#enable-paging" aria-label="Anchor link for: enable-paging">🔗</a>Enable Paging</h3> <p>To enable paging and enter long mode, we need to do the following:</p> <ol> <li>write the address of the P4 table to the CR3 register (the CPU will look there, see the <a href="https://os.phil-opp.com/entering-longmode/#paging">paging section</a>)</li> <li>long mode is an extension of <a href="https://en.wikipedia.org/wiki/Physical_Address_Extension">Physical Address Extension</a> (PAE), so we need to enable PAE first</li> <li>Set the long mode bit in the EFER register</li> <li>Enable Paging</li> </ol> <p>The assembly function looks like this (some boring bit-moving to various registers):</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>enable_paging: </span><span style="color:#608b4e;"> ; load P4 to cr3 register (cpu uses this to access the P4 table) </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, p4_table </span><span> </span><span style="color:#569cd6;">mov </span><span>cr3, eax </span><span> </span><span style="color:#608b4e;"> ; enable PAE-flag in cr4 (Physical Address Extension) </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, cr4 </span><span> </span><span style="color:#569cd6;">or </span><span>eax, </span><span style="color:#b4cea8;">1 </span><span>&lt;&lt; </span><span style="color:#b4cea8;">5 </span><span> </span><span style="color:#569cd6;">mov </span><span>cr4, eax </span><span> </span><span style="color:#608b4e;"> ; set the long mode bit in the EFER MSR (model specific register) </span><span> </span><span style="color:#569cd6;">mov </span><span>ecx, </span><span style="color:#b4cea8;">0xC0000080 </span><span> </span><span style="color:#569cd6;">rdmsr </span><span> </span><span style="color:#569cd6;">or </span><span>eax, </span><span style="color:#b4cea8;">1 </span><span>&lt;&lt; </span><span style="color:#b4cea8;">8 </span><span> </span><span style="color:#569cd6;">wrmsr </span><span> </span><span style="color:#608b4e;"> ; enable paging in the cr0 register </span><span> </span><span style="color:#569cd6;">mov </span><span>eax, cr0 </span><span> </span><span style="color:#569cd6;">or </span><span>eax, </span><span style="color:#b4cea8;">1 </span><span>&lt;&lt; </span><span style="color:#b4cea8;">31 </span><span> </span><span style="color:#569cd6;">mov </span><span>cr0, eax </span><span> </span><span> </span><span style="color:#569cd6;">ret </span></code></pre> <p>The <code>or eax, 1 &lt;&lt; X</code> is a common pattern. It sets the bit <code>X</code> in the eax register (<code>&lt;&lt;</code> is a left shift). Through <code>rdmsr</code> and <code>wrmsr</code> it’s possible to read/write to the so-called model specific registers at address <code>ecx</code> (in this case <code>ecx</code> points to the EFER register).</p> <p>Finally we need to call our new functions in <code>start</code>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>... </span><span>start: </span><span> </span><span style="color:#569cd6;">mov </span><span>esp, stack_top </span><span> </span><span> </span><span style="color:#569cd6;">call </span><span>check_multiboot </span><span> </span><span style="color:#569cd6;">call </span><span>check_cpuid </span><span> </span><span style="color:#569cd6;">call </span><span>check_long_mode </span><span> </span><span> </span><span style="color:#569cd6;">call </span><span>set_up_page_tables</span><span style="color:#608b4e;"> ; new </span><span> </span><span style="color:#569cd6;">call </span><span>enable_paging</span><span style="color:#608b4e;"> ; new </span><span> </span><span style="color:#608b4e;"> ; print `OK` to screen </span><span> </span><span style="color:#569cd6;">mov </span><span>dword [</span><span style="color:#b4cea8;">0xb8000</span><span>], </span><span style="color:#b4cea8;">0x2f4b2f4f </span><span> </span><span style="color:#569cd6;">hlt </span><span>... </span></code></pre> <p>To test it we execute <code>make run</code>. If the green OK is still printed, we have successfully enabled paging!</p> <h2 id="the-global-descriptor-table"><a class="zola-anchor" href="#the-global-descriptor-table" aria-label="Anchor link for: the-global-descriptor-table">🔗</a>The Global Descriptor Table</h2> <p>After enabling Paging, the processor is in long mode. So we can use 64-bit instructions now, right? Wrong. The processor is still in a 32-bit compatibility submode. To actually execute 64-bit code, we need to set up a new Global Descriptor Table. The Global Descriptor Table (GDT) was used for <em>Segmentation</em> in old operating systems. I won’t explain Segmentation but the <a href="http://pages.cs.wisc.edu/~remzi/OSTEP/">Three Easy Pieces</a> OS book has good introduction (<a href="http://pages.cs.wisc.edu/~remzi/OSTEP/vm-segmentation.pdf">PDF</a>) again.</p> <p>Today almost everyone uses Paging instead of Segmentation (and so do we). But on x86, a GDT is always required, even when you’re not using Segmentation. GRUB has set up a valid 32-bit GDT for us but now we need to switch to a long mode GDT.</p> <p>A GDT always starts with a 0-entry and contains an arbitrary number of segment entries afterwards. A 64-bit entry has the following format:</p> <table><thead><tr><th>Bit(s)</th><th>Name</th><th>Meaning</th></tr></thead><tbody> <tr><td>0-41</td><td>ignored</td><td>ignored in 64-bit mode</td></tr> <tr><td>42</td><td>conforming</td><td>the current privilege level can be higher than the specified level for code segments (else it must match exactly)</td></tr> <tr><td>43</td><td>executable</td><td>if set, it’s a code segment, else it’s a data segment</td></tr> <tr><td>44</td><td>descriptor type</td><td>should be 1 for code and data segments</td></tr> <tr><td>45-46</td><td>privilege</td><td>the <a href="https://wiki.osdev.org/Security#Rings">ring level</a>: 0 for kernel, 3 for user</td></tr> <tr><td>47</td><td>present</td><td>must be 1 for valid selectors</td></tr> <tr><td>48-52</td><td>ignored</td><td>ignored in 64-bit mode</td></tr> <tr><td>53</td><td>64-bit</td><td>should be set for 64-bit code segments</td></tr> <tr><td>54</td><td>32-bit</td><td>must be 0 for 64-bit segments</td></tr> <tr><td>55-63</td><td>ignored</td><td>ignored in 64-bit mode</td></tr> </tbody></table> <p>We need one code segment, a data segment is not necessary in 64-bit mode. Code segments have the following bits set: <em>descriptor type</em>, <em>present</em>, <em>executable</em> and the <em>64-bit</em> flag. Translated to assembly the long mode GDT looks like this:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>section .rodata </span><span>gdt64: </span><span> dq </span><span style="color:#b4cea8;">0</span><span style="color:#608b4e;"> ; zero entry </span><span> dq (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">43</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">44</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">47</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">53</span><span>)</span><span style="color:#608b4e;"> ; code segment </span></code></pre> <p>We chose the <code>.rodata</code> section here because it’s initialized read-only data. The <code>dq</code> command stands for <code>define quad</code> and outputs a 64-bit constant (similar to <code>dw</code> and <code>dd</code>). And the <code>(1&lt;&lt;43)</code> is a bit shift that sets bit 43.</p> <h3 id="loading-the-gdt"><a class="zola-anchor" href="#loading-the-gdt" aria-label="Anchor link for: loading-the-gdt">🔗</a>Loading the GDT</h3> <p>To load our new 64-bit GDT, we have to tell the CPU its address and length. We do this by passing the memory location of a special pointer structure to the <code>lgdt</code> (load GDT) instruction. The pointer structure looks like this:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>gdt64: </span><span> dq </span><span style="color:#b4cea8;">0</span><span style="color:#608b4e;"> ; zero entry </span><span> dq (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">43</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">44</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">47</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">53</span><span>)</span><span style="color:#608b4e;"> ; code segment </span><span>.pointer: </span><span> dw $ - gdt64 - </span><span style="color:#b4cea8;">1 </span><span> dq gdt64 </span></code></pre> <p>The first 2 bytes specify the (GDT length - 1). The <code>$</code> is a special symbol that is replaced with the current address (it’s equal to <code>.pointer</code> in our case). The following 8 bytes specify the GDT address. Labels that start with a point (such as <code>.pointer</code>) are sub-labels of the last label without point. To access them, they must be prefixed with the parent label (e.g., <code>gdt64.pointer</code>).</p> <p>Now we can load the GDT in <code>start</code>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>start: </span><span> ... </span><span> </span><span style="color:#569cd6;">call </span><span>enable_paging </span><span> </span><span style="color:#608b4e;"> ; load the 64-bit GDT </span><span> </span><span style="color:#569cd6;">lgdt </span><span>[gdt64.pointer] </span><span> </span><span style="color:#608b4e;"> ; print `OK` to screen </span><span> ... </span></code></pre> <p>When you still see the green <code>OK</code>, everything went fine and the new GDT is loaded. But we still can’t execute 64-bit code: The code selector register <code>cs</code> still has the values from the old GDT. To update it, we need to load it with the GDT offset (in bytes) of the desired segment. In our case the code segment starts at byte 8 of the GDT, but we don’t want to hardcode that 8 (in case we modify our GDT later). Instead, we add a <code>.code</code> label to our GDT, that calculates the offset directly from the GDT:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>section .rodata </span><span>gdt64: </span><span> dq </span><span style="color:#b4cea8;">0</span><span style="color:#608b4e;"> ; zero entry </span><span>.code: equ $ - gdt64</span><span style="color:#608b4e;"> ; new </span><span> dq (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">43</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">44</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">47</span><span>) | (</span><span style="color:#b4cea8;">1</span><span>&lt;&lt;</span><span style="color:#b4cea8;">53</span><span>)</span><span style="color:#608b4e;"> ; code segment </span><span>.pointer: </span><span> ... </span></code></pre> <p>We can’t just use a normal label here, since we need the table <em>offset</em>. We calculate this offset using the current address <code>$</code> and set the label to this value using <a href="https://www.nasm.us/doc/nasmdoc3.html#section-3.2.4">equ</a>. Now we can use <code>gdt64.code</code> instead of 8 and this label will still work if we modify the GDT.</p> <p>In order to finally enter the true 64-bit mode, we need to load <code>cs</code> with <code>gdt64.code</code>. But we can’t do it through <code>mov</code>. The only way to reload the code selector is a <em>far jump</em> or a <em>far return</em>. These instructions work like a normal jump/return but change the code selector. We use a far jump to a long mode label:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>global start </span><span>extern long_mode_start </span><span>... </span><span>start: </span><span> ... </span><span> </span><span style="color:#569cd6;">lgdt </span><span>[gdt64.pointer] </span><span> </span><span> </span><span style="color:#569cd6;">jmp </span><span>gdt64.code:long_mode_start </span><span>... </span></code></pre> <p>The actual <code>long_mode_start</code> label is defined as <code>extern</code>, so it’s part of another file. The <code>jmp gdt64.code:long_mode_start</code> is the mentioned far jump.</p> <p>I put the 64-bit code into a new file to separate it from the 32-bit code, thereby we can’t call the (now invalid) 32-bit code accidentally. The new file (I named it <code>long_mode_init.asm</code>) looks like this:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>global long_mode_start </span><span> </span><span>section .text </span><span>bits 64 </span><span>long_mode_start: </span><span style="color:#608b4e;"> ; print `OKAY` to screen </span><span> </span><span style="color:#569cd6;">mov </span><span>rax, </span><span style="color:#b4cea8;">0x2f592f412f4b2f4f </span><span> </span><span style="color:#569cd6;">mov </span><span>qword [</span><span style="color:#b4cea8;">0xb8000</span><span>], rax </span><span> </span><span style="color:#569cd6;">hlt </span></code></pre> <p>You should see a green <code>OKAY</code> on the screen. Some notes on this last step:</p> <ul> <li>As the CPU expects 64-bit instructions now, we use <code>bits 64</code></li> <li>We can now use the extended registers. Instead of the 32-bit <code>eax</code>, <code>ebx</code>, etc. we now have the 64-bit <code>rax</code>, <code>rbx</code>, …</li> <li>and we can write these 64-bit registers directly to memory using <code>mov qword</code> (quad word)</li> </ul> <p><em>Congratulations</em>! You have successfully wrestled through this CPU configuration and compatibility mode mess :).</p> <h4 id="one-last-thing"><a class="zola-anchor" href="#one-last-thing" aria-label="Anchor link for: one-last-thing">🔗</a>One Last Thing</h4> <p>Above, we reloaded the code segment register <code>cs</code> with the new GDT offset. However, the data segment registers <code>ss</code>, <code>ds</code>, <code>es</code>, <code>fs</code>, and <code>gs</code> still contain the data segment offsets of the old GDT. This isn’t necessarily bad, since they’re ignored by almost all instructions in 64-bit mode. However, there are a few instructions that expect a valid data segment descriptor <em>or the null descriptor</em> in those registers. An example is the the <a href="https://os.phil-opp.com/returning-from-exceptions/#the-iretq-instruction">iretq</a> instruction that we’ll need in the <a href="https://os.phil-opp.com/returning-from-exceptions/"><em>Returning from Exceptions</em></a> post.</p> <p>To avoid future problems, we reload all data segment registers with null:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>long_mode_start: </span><span style="color:#608b4e;"> ; load 0 into all data segment registers </span><span> </span><span style="color:#569cd6;">mov </span><span>ax, </span><span style="color:#b4cea8;">0 </span><span> </span><span style="color:#569cd6;">mov </span><span>ss, ax </span><span> </span><span style="color:#569cd6;">mov </span><span>ds, ax </span><span> </span><span style="color:#569cd6;">mov </span><span>es, ax </span><span> </span><span style="color:#569cd6;">mov </span><span>fs, ax </span><span> </span><span style="color:#569cd6;">mov </span><span>gs, ax </span><span> </span><span style="color:#608b4e;"> ; print `OKAY` to screen </span><span> ... </span></code></pre> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>It’s time to finally leave assembly behind and switch to <a href="https://www.rust-lang.org/">Rust</a>. Rust is a systems language without garbage collections that guarantees memory safety. Through a real type system and many abstractions it feels like a high-level language but can still be low-level enough for OS development. The <a href="https://os.phil-opp.com/set-up-rust/">next post</a> describes the Rust setup.</p> <h2 id="footnotes"><a class="zola-anchor" href="#footnotes" aria-label="Anchor link for: footnotes">🔗</a>Footnotes</h2> <div class="footnote-definition" id="hardware_lookup"><sup class="footnote-definition-label">1</sup> <p>In the x86 architecture, the page tables are <em>hardware walked</em>, so the CPU will look at the table on its own when it needs a translation. Other architectures, for example MIPS, just throw an exception and let the OS translate the virtual address.</p> </div> <div class="footnote-definition" id="virtual_physical_translation_source"><sup class="footnote-definition-label">2</sup> <p>Image source: <a href="https://commons.wikimedia.org/wiki/File:X86_Paging_64bit.svg">Wikipedia</a>, with modified font size, page table naming, and removed sign extended bits. The modified file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.</p> </div> <div class="footnote-definition" id="page_table_alignment"><sup class="footnote-definition-label">3</sup> <p>Page tables need to be page-aligned as the bits 0-11 are used for flags. By putting these tables at the beginning of <code>.bss</code>, the linker can just page align the whole section and we don’t have unused padding bytes in between.</p> </div> A minimal Multiboot Kernel Tue, 18 Aug 2015 00:00:00 +0000 https://os.phil-opp.com/multiboot-kernel/ https://os.phil-opp.com/multiboot-kernel/ <p>This post explains how to create a minimal x86 operating system kernel using the Multiboot standard. In fact, it will just boot and print <code>OK</code> to the screen. In subsequent blog posts we will extend it using the <a href="https://www.rust-lang.org/">Rust</a> programming language.</p> <span id="continue-reading"></span> <p>I tried to explain everything in detail and to keep the code as simple as possible. If you have any questions, suggestions or other issues, please leave a comment or <a href="https://github.com/phil-opp/blog_os/issues">create an issue</a> on Github. The source code is available in a <a href="https://github.com/phil-opp/blog_os/tree/first_edition_post_1/src/arch/x86_64">repository</a>, too.</p> <p>Note that this tutorial is written mainly for Linux. For some known problems on OS X see the comment section and <a href="https://github.com/phil-opp/blog_os/issues/55">this issue</a>. If you want to use a virtual Linux machine, you can find instructions and a Vagrantfile in Ashley Willams’s <a href="https://github.com/ashleygwilliams/x86-kernel">x86-kernel repository</a>.</p> <h2 id="overview"><a class="zola-anchor" href="#overview" aria-label="Anchor link for: overview">🔗</a>Overview</h2> <p>When you turn on a computer, it loads the <a href="https://en.wikipedia.org/wiki/BIOS">BIOS</a> from some special flash memory. The BIOS runs self test and initialization routines of the hardware, then it looks for bootable devices. If it finds one, the control is transferred to its <em>bootloader</em>, which is a small portion of executable code stored at the device’s beginning. The bootloader has to determine the location of the kernel image on the device and load it into memory. It also needs to switch the CPU to the so-called <a href="https://en.wikipedia.org/wiki/Protected_mode">protected mode</a> because x86 CPUs start in the very limited <a href="https://wiki.osdev.org/Real_Mode">real mode</a> by default (to be compatible to programs from 1978).</p> <p>We won’t write a bootloader because that would be a complex project on its own (if you really want to do it, check out <a href="https://wiki.osdev.org/Rolling_Your_Own_Bootloader"><em>Rolling Your Own Bootloader</em></a>). Instead we will use one of the <a href="https://en.wikipedia.org/wiki/Comparison_of_boot_loaders">many well-tested bootloaders</a> out there to boot our kernel from a CD-ROM. But which one?</p> <h2 id="multiboot"><a class="zola-anchor" href="#multiboot" aria-label="Anchor link for: multiboot">🔗</a>Multiboot</h2> <p>Fortunately there is a bootloader standard: the <a href="https://en.wikipedia.org/wiki/Multiboot_Specification">Multiboot Specification</a>. Our kernel just needs to indicate that it supports Multiboot and every Multiboot-compliant bootloader can boot it. We will use the Multiboot 2 specification (<a href="https://nongnu.askapache.com/grub/phcoder/multiboot.pdf">PDF</a>) together with the well-known <a href="https://wiki.osdev.org/GRUB_2">GRUB 2</a> bootloader.</p> <p>To indicate our Multiboot 2 support to the bootloader, our kernel must start with a <em>Multiboot Header</em>, which has the following format:</p> <table><thead><tr><th>Field</th><th>Type</th><th>Value</th></tr></thead><tbody> <tr><td>magic number</td><td>u32</td><td><code>0xE85250D6</code></td></tr> <tr><td>architecture</td><td>u32</td><td><code>0</code> for i386, <code>4</code> for MIPS</td></tr> <tr><td>header length</td><td>u32</td><td>total header size, including tags</td></tr> <tr><td>checksum</td><td>u32</td><td><code>-(magic + architecture + header_length)</code></td></tr> <tr><td>tags</td><td>variable</td><td></td></tr> <tr><td>end tag</td><td>(u16, u16, u32)</td><td><code>(0, 0, 8)</code></td></tr> </tbody></table> <p>Converted to a x86 assembly file it looks like this (Intel syntax):</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>section .multiboot_header </span><span>header_start: </span><span> dd </span><span style="color:#b4cea8;">0xe85250d6</span><span style="color:#608b4e;"> ; magic number (multiboot 2) </span><span> dd </span><span style="color:#b4cea8;">0</span><span style="color:#608b4e;"> ; architecture 0 (protected mode i386) </span><span> dd header_end - header_start</span><span style="color:#608b4e;"> ; header length </span><span style="color:#608b4e;"> ; checksum </span><span> dd </span><span style="color:#b4cea8;">0x100000000 </span><span>- (</span><span style="color:#b4cea8;">0xe85250d6 </span><span>+ </span><span style="color:#b4cea8;">0 </span><span>+ (header_end - header_start)) </span><span> </span><span style="color:#608b4e;"> ; insert optional multiboot tags here </span><span> </span><span style="color:#608b4e;"> ; required end tag </span><span> dw </span><span style="color:#b4cea8;">0</span><span style="color:#608b4e;"> ; type </span><span> dw </span><span style="color:#b4cea8;">0</span><span style="color:#608b4e;"> ; flags </span><span> dd </span><span style="color:#b4cea8;">8</span><span style="color:#608b4e;"> ; size </span><span>header_end: </span></code></pre> <p>If you don’t know x86 assembly, here is some quick guide:</p> <ul> <li>the header will be written to a section named <code>.multiboot_header</code> (we need this later)</li> <li><code>header_start</code> and <code>header_end</code> are <em>labels</em> that mark a memory location, we use them to calculate the header length easily</li> <li><code>dd</code> stands for <code>define double</code> (32bit) and <code>dw</code> stands for <code>define word</code> (16bit). They just output the specified 32bit/16bit constant.</li> <li>the additional <code>0x100000000</code> in the checksum calculation is a small hack<sup class="footnote-reference"><a href="#fn-checksum_hack">1</a></sup> to avoid a compiler warning</li> </ul> <p>We can already <em>assemble</em> this file (which I called <code>multiboot_header.asm</code>) using <code>nasm</code>. It produces a flat binary by default, so the resulting file just contains our 24 bytes (in little endian if you work on a x86 machine):</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; nasm multiboot_header.asm </span><span>&gt; hexdump -x multiboot_header </span><span>0000000 50d6 e852 0000 0000 0018 0000 af12 17ad </span><span>0000010 0000 0000 0008 0000 </span><span>0000018 </span></code></pre> <h2 id="the-boot-code"><a class="zola-anchor" href="#the-boot-code" aria-label="Anchor link for: the-boot-code">🔗</a>The Boot Code</h2> <p>To boot our kernel, we must add some code that the bootloader can call. Let’s create a file named <code>boot.asm</code>:</p> <pre data-lang="nasm" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-nasm "><code class="language-nasm" data-lang="nasm"><span>global start </span><span> </span><span>section .text </span><span>bits 32 </span><span>start: </span><span style="color:#608b4e;"> ; print `OK` to screen </span><span> </span><span style="color:#569cd6;">mov </span><span>dword [</span><span style="color:#b4cea8;">0xb8000</span><span>], </span><span style="color:#b4cea8;">0x2f4b2f4f </span><span> </span><span style="color:#569cd6;">hlt </span></code></pre> <p>There are some new commands:</p> <ul> <li><code>global</code> exports a label (makes it public). As <code>start</code> will be the entry point of our kernel, it needs to be public.</li> <li>the <code>.text</code> section is the default section for executable code</li> <li><code>bits 32</code> specifies that the following lines are 32-bit instructions. It’s needed because the CPU is still in <a href="https://en.wikipedia.org/wiki/Protected_mode">Protected mode</a> when GRUB starts our kernel. When we switch to <a href="https://en.wikipedia.org/wiki/Long_mode">Long mode</a> in the <a href="https://os.phil-opp.com/entering-longmode/">next post</a> we can use <code>bits 64</code> (64-bit instructions).</li> <li>the <code>mov dword</code> instruction moves the 32bit constant <code>0x2f4b2f4f</code> to the memory at address <code>b8000</code> (it prints <code>OK</code> to the screen, an explanation follows in the next posts)</li> <li><code>hlt</code> is the halt instruction and causes the CPU to stop</li> </ul> <p>Through assembling, viewing and disassembling we can see the CPU <a href="https://en.wikipedia.org/wiki/Opcode">Opcodes</a> in action:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; nasm boot.asm </span><span>&gt; hexdump -x boot </span><span>0000000 05c7 8000 000b 2f4b 2f4f 00f4 </span><span>000000b </span><span>&gt; ndisasm -b 32 boot </span><span>00000000 C70500800B004B2F mov dword [dword 0xb8000],0x2f4b2f4f </span><span> -4F2F </span><span>0000000A F4 hlt </span></code></pre> <h2 id="building-the-executable"><a class="zola-anchor" href="#building-the-executable" aria-label="Anchor link for: building-the-executable">🔗</a>Building the Executable</h2> <p>To boot our executable later through GRUB, it should be an <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF</a> executable. So we want <code>nasm</code> to create ELF <a href="https://wiki.osdev.org/Object_Files">object files</a> instead of plain binaries. To do that, we simply pass the <code>‑f elf64</code> argument to it.</p> <p>To create the ELF <em>executable</em>, we need to <a href="https://en.wikipedia.org/wiki/Linker_(computing)">link</a> the object files together. We use a custom <a href="https://sourceware.org/binutils/docs/ld/Scripts.html">linker script</a> named <code>linker.ld</code>:</p> <pre data-lang="ld" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-ld "><code class="language-ld" data-lang="ld"><span style="color:#569cd6;">ENTRY</span><span>(start) </span><span> </span><span style="color:#569cd6;">SECTIONS</span><span> { </span><span> . = </span><span style="color:#b5cea8;">1M</span><span>; </span><span> </span><span> .boot : </span><span> { </span><span> </span><span style="color:#608b4e;">/* ensure that the multiboot header is at the beginning */ </span><span> *(.multiboot_header) </span><span> } </span><span> </span><span> .text : </span><span> { </span><span> *(.text) </span><span> } </span><span>} </span></code></pre> <p>Let’s translate it:</p> <ul> <li><code>start</code> is the entry point, the bootloader will jump to it after loading the kernel</li> <li><code>. = 1M;</code> sets the load address of the first section to 1 MiB, which is a conventional place to load a kernel<sup class="footnote-reference"><a href="#Linker 1M">2</a></sup></li> <li>the executable will have two sections: <code>.boot</code> at the beginning and <code>.text</code> afterwards</li> <li>the <code>.text</code> output section contains all input sections named <code>.text</code></li> <li>Sections named <code>.multiboot_header</code> are added to the first output section (<code>.boot</code>) to ensure they are at the beginning of the executable. This is necessary because GRUB expects to find the Multiboot header very early in the file.</li> </ul> <p>So let’s create the ELF object files and link them using our new linker script:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; nasm -f elf64 multiboot_header.asm </span><span>&gt; nasm -f elf64 boot.asm </span><span>&gt; ld -n -o kernel.bin -T linker.ld multiboot_header.o boot.o </span></code></pre> <p>It’s important to pass the <code>-n</code> (or <code>--nmagic</code>) flag to the linker, which disables the automatic section alignment in the executable. Otherwise the linker may page align the <code>.boot</code> section in the executable file. If that happens, GRUB isn’t able to find the Multiboot header because it isn’t at the beginning anymore.</p> <p>We can use <code>objdump</code> to print the sections of the generated executable and verify that the <code>.boot</code> section has a low file offset:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>&gt; objdump -h kernel.bin </span><span>kernel.bin: file format elf64-x86-64 </span><span> </span><span>Sections: </span><span>Idx Name Size VMA LMA File off Algn </span><span> 0 .boot 00000018 0000000000100000 0000000000100000 00000080 2**0 </span><span> CONTENTS, ALLOC, LOAD, READONLY, DATA </span><span> 1 .text 0000000b 0000000000100020 0000000000100020 000000a0 2**4 </span><span> CONTENTS, ALLOC, LOAD, READONLY, CODE </span></code></pre> <p><em>Note</em>: The <code>ld</code> and <code>objdump</code> commands are platform specific. If you’re <em>not</em> working on x86_64 architecture, you will need to <a href="https://os.phil-opp.com/cross-compile-binutils/">cross compile binutils</a>. Then use <code>x86_64‑elf‑ld</code> and <code>x86_64‑elf‑objdump</code> instead of <code>ld</code> and <code>objdump</code>.</p> <h2 id="creating-the-iso"><a class="zola-anchor" href="#creating-the-iso" aria-label="Anchor link for: creating-the-iso">🔗</a>Creating the ISO</h2> <p>All PC BIOSes know how to boot from a CD-ROM, so we want to create a bootable CD-ROM image, containing our kernel and the GRUB bootloader’s files, in a single file called an <a href="https://en.wikipedia.org/wiki/ISO_image">ISO</a>. Make the following directory structure and copy the <code>kernel.bin</code> to the right place:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>isofiles </span><span>└── boot </span><span> ├── grub </span><span> │ └── grub.cfg </span><span> └── kernel.bin </span><span> </span></code></pre> <p>The <code>grub.cfg</code> specifies the file name of our kernel and its Multiboot 2 compliance. It looks like this:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>set timeout=0 </span><span>set default=0 </span><span> </span><span>menuentry &quot;my os&quot; { </span><span> multiboot2 /boot/kernel.bin </span><span> boot </span><span>} </span></code></pre> <p>Now we can create a bootable image using the command:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>grub-mkrescue -o os.iso isofiles </span></code></pre> <p><em>Note</em>: <code>grub-mkrescue</code> causes problems on some platforms. If it does not work for you, try the following steps:</p> <ul> <li>try to run it with <code>--verbose</code></li> <li>make sure <code>xorriso</code> is installed (<code>xorriso</code> or <code>libisoburn</code> package)</li> <li>If you’re using an EFI-system, <code>grub-mkrescue</code> tries to create an EFI image by default. You can either pass <code>-d /usr/lib/grub/i386-pc</code> to avoid EFI or install the <code>mtools</code> package to get a working EFI image</li> <li>on some system the command is named <code>grub2-mkrescue</code></li> </ul> <h2 id="booting"><a class="zola-anchor" href="#booting" aria-label="Anchor link for: booting">🔗</a>Booting</h2> <p>Now it’s time to boot our OS. We will use <a href="https://en.wikipedia.org/wiki/QEMU">QEMU</a>:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>qemu-system-x86_64 -cdrom os.iso </span></code></pre> <p><img src="https://os.phil-opp.com/multiboot-kernel/qemu-ok.png" alt="qemu output" /></p> <p>Notice the green <code>OK</code> in the upper left corner. If it does not work for you, take a look at the comment section.</p> <p>Let’s summarize what happens:</p> <ol> <li>the BIOS loads the bootloader (GRUB) from the virtual CD-ROM (the ISO)</li> <li>the bootloader reads the kernel executable and finds the Multiboot header</li> <li>it copies the <code>.boot</code> and <code>.text</code> sections to memory (to addresses <code>0x100000</code> and <code>0x100020</code>)</li> <li>it jumps to the entry point (<code>0x100020</code>, you can obtain it through <code>objdump -f</code>)</li> <li>our kernel prints the green <code>OK</code> and stops the CPU</li> </ol> <p>You can test it on real hardware, too. Just burn the ISO to a disk or USB stick and boot from it.</p> <h2 id="build-automation"><a class="zola-anchor" href="#build-automation" aria-label="Anchor link for: build-automation">🔗</a>Build Automation</h2> <p>Right now we need to execute 4 commands in the right order every time we change a file. That’s bad. So let’s automate the build using a <code>Makefile</code>. But first we should create some clean directory structure for our source files to separate the architecture specific files:</p> <pre style="background-color:#1e1e1e;color:#dcdcdc;"><code><span>… </span><span>├── Makefile </span><span>└── src </span><span> └── arch </span><span> └── x86_64 </span><span> ├── multiboot_header.asm </span><span> ├── boot.asm </span><span> ├── linker.ld </span><span> └── grub.cfg </span></code></pre> <p>The Makefile looks like this (indented with tabs instead of spaces):</p> <pre data-lang="Makefile" style="background-color:#1e1e1e;color:#dcdcdc;" class="language-Makefile "><code class="language-Makefile" data-lang="Makefile"><span>arch ?= </span><span style="background-color:#282828;color:#d69d85;">x86_64</span><span> </span><span>kernel := </span><span style="background-color:#282828;color:#d69d85;">build/kernel-</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">.bin</span><span> </span><span>iso := </span><span style="background-color:#282828;color:#d69d85;">build/os-</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">.iso</span><span> </span><span> </span><span>linker_script := </span><span style="background-color:#282828;color:#d69d85;">src/arch/</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">/linker.ld</span><span> </span><span>grub_cfg := </span><span style="background-color:#282828;color:#d69d85;">src/arch/</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">/grub.cfg</span><span> </span><span>assembly_source_files := </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">wildcard </span><span style="background-color:#282828;color:#d69d85;">src/arch/</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">/</span><span style="background-color:#282828;color:#dcdcdc;">*</span><span style="background-color:#282828;color:#d69d85;">.asm</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span>assembly_object_files := </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">patsubst </span><span style="background-color:#282828;color:#d69d85;">src/arch/</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">/</span><span style="background-color:#282828;color:#dcdcdc;">%</span><span style="background-color:#282828;color:#d69d85;">.asm, \ </span><span style="background-color:#282828;color:#d69d85;"> build/arch/</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">/</span><span style="background-color:#282828;color:#dcdcdc;">%</span><span style="background-color:#282828;color:#d69d85;">.o, </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">assembly_source_files</span><span style="background-color:#282828;color:#569cd6;">))</span><span> </span><span> </span><span>.PHONY: </span><span style="background-color:#282828;color:#d69d85;">all clean run iso</span><span> </span><span> </span><span>all: </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">kernel</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span> </span><span>clean: </span><span> </span><span style="color:#569cd6;">@</span><span>rm -r build </span><span> </span><span>run: </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">iso</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span> </span><span style="color:#569cd6;">@</span><span>qemu-system-x86_64 -cdrom </span><span style="color:#569cd6;">$(</span><span>iso</span><span style="color:#569cd6;">) </span><span> </span><span>iso: </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">iso</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span> </span><span style="color:#569cd6;">$(</span><span>iso</span><span style="color:#569cd6;">)</span><span>: </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">kernel</span><span style="background-color:#282828;color:#569cd6;">) $(</span><span style="background-color:#282828;color:#dcdcdc;">grub_cfg</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span> </span><span style="color:#569cd6;">@</span><span>mkdir -p build/isofiles/boot/grub </span><span> </span><span style="color:#569cd6;">@</span><span>cp </span><span style="color:#569cd6;">$(</span><span>kernel</span><span style="color:#569cd6;">)</span><span> build/isofiles/boot/kernel.bin </span><span> </span><span style="color:#569cd6;">@</span><span>cp </span><span style="color:#569cd6;">$(</span><span>grub_cfg</span><span style="color:#569cd6;">)</span><span> build/isofiles/boot/grub </span><span> </span><span style="color:#569cd6;">@</span><span>grub-mkrescue -o </span><span style="color:#569cd6;">$(</span><span>iso</span><span style="color:#569cd6;">)</span><span> build/isofiles </span><span style="color:#b5cea8;">2</span><span>&gt; /dev/null </span><span> </span><span style="color:#569cd6;">@</span><span>rm -r build/isofiles </span><span> </span><span style="color:#569cd6;">$(</span><span>kernel</span><span style="color:#569cd6;">)</span><span>: </span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">assembly_object_files</span><span style="background-color:#282828;color:#569cd6;">) $(</span><span style="background-color:#282828;color:#dcdcdc;">linker_script</span><span style="background-color:#282828;color:#569cd6;">)</span><span> </span><span> </span><span style="color:#569cd6;">@</span><span>ld -n -T </span><span style="color:#569cd6;">$(</span><span>linker_script</span><span style="color:#569cd6;">)</span><span> -o </span><span style="color:#569cd6;">$(</span><span>kernel</span><span style="color:#569cd6;">) $(</span><span>assembly_object_files</span><span style="color:#569cd6;">) </span><span> </span><span style="color:#608b4e;"># compile assembly files </span><span>build/arch/</span><span style="color:#569cd6;">$(</span><span>arch</span><span style="color:#569cd6;">)</span><span>/%.o: </span><span style="background-color:#282828;color:#d69d85;">src/arch/</span><span style="background-color:#282828;color:#569cd6;">$(</span><span style="background-color:#282828;color:#dcdcdc;">arch</span><span style="background-color:#282828;color:#569cd6;">)</span><span style="background-color:#282828;color:#d69d85;">/</span><span style="background-color:#282828;color:#dcdcdc;">%</span><span style="background-color:#282828;color:#d69d85;">.asm</span><span> </span><span> </span><span style="color:#569cd6;">@</span><span>mkdir -p </span><span style="color:#569cd6;">$(</span><span>shell dirname $@</span><span style="color:#569cd6;">) </span><span> </span><span style="color:#569cd6;">@</span><span>nasm -felf64 $&lt; -o $@ </span></code></pre> <p>Some comments (see the [Makefile tutorial] if you don’t know <code>make</code>):</p> <ul> <li>the <code>$(wildcard src/arch/$(arch)/*.asm)</code> chooses all assembly files in the src/arch/$(arch)` directory, so you don’t have to update the Makefile when you add a file</li> <li>the <code>patsubst</code> operation for <code>assembly_object_files</code> just translates <code>src/arch/$(arch)/XYZ.asm</code> to <code>build/arch/$(arch)/XYZ.o</code></li> <li>the <code>$&lt;</code> and <code>$@</code> in the assembly target are <a href="https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html">automatic variables</a></li> <li>if you’re using <a href="https://os.phil-opp.com/cross-compile-binutils/">cross-compiled binutils</a> just replace <code>ld</code> with <code>x86_64‑elf‑ld</code></li> </ul> <p>Now we can invoke <code>make</code> and all updated assembly files are compiled and linked. The <code>make iso</code> command also creates the ISO image and <code>make run</code> will additionally start QEMU.</p> <h2 id="what-s-next"><a class="zola-anchor" href="#what-s-next" aria-label="Anchor link for: what-s-next">🔗</a>What’s next?</h2> <p>In the <a href="https://os.phil-opp.com/entering-longmode/">next post</a> we will create a page table and do some CPU configuration to switch to the 64-bit <a href="https://en.wikipedia.org/wiki/Long_mode">long mode</a>.</p> <h2 id="footnotes"><a class="zola-anchor" href="#footnotes" aria-label="Anchor link for: footnotes">🔗</a>Footnotes</h2> <div class="footnote-definition" id="fn-checksum_hack"><sup class="footnote-definition-label">1</sup> <p>The formula from the table, <code>-(magic + architecture + header_length)</code>, creates a negative value that doesn’t fit into 32bit. By subtracting from <code>0x100000000</code> (= 2^(32)) instead, we keep the value positive without changing its truncated value. Without the additional sign bit(s) the result fits into 32bit and the compiler is happy :).</p> </div> <div class="footnote-definition" id="Linker 1M"><sup class="footnote-definition-label">2</sup> <p>We don’t want to load the kernel to e.g. <code>0x0</code> because there are many special memory areas below the 1MB mark (for example the so-called VGA buffer at <code>0xb8000</code>, that we use to print <code>OK</code> to the screen).</p> </div>