Alex's Adventures on the Infobahn - qemuhttps://www.bennee.com/~alex/2023-12-10T19:28:00+00:00the wanderings of a supposed digital nativeA Systems Programmer's Perspectives on Generative AI2023-12-10T19:28:00+00:002023-12-10T19:28:00+00:00alextag:www.bennee.com,2023-12-10:/~alex/blog/2023/12/10/a-systems-programmers-perspectives-on-generative-ai/<p>Alex discusses his experience playing with the current crop of large language models and muses on the power of processors multiplying lots of numbers together.</p><p>Like many people over the last few months I've been playing with a number of Large Language Models (LLMs). LLMs are
perhaps best typified by the current media star ChatGPT. It is hard to avoid the current media buzz while every tech
titan is developing their "AI" play and people are exposed to tools where the label of Artificial Intelligence is
liberally applied. The ability of these models to spit out <a href="https://www.bennee.com/~alex/blog/2023/10/22/comparing-forge-based-and-email-based-workflow-for-open-source-projects/">competent comprehensible
text</a> is seemingly a step change in ability compared to previous generations of tech.</p>
<p>I thought I would try and collect some of my thoughts and perspectives on this from the point of view of a <a href="https://en.wikipedia.org/wiki/Systems_programming" title="link to wikipedia definition">systems
programmer</a>. For those not familiar
with the term is refers to the low level development of providing platforms for the applications people actually use. In
my case a lot of the work I do on <a href="https://www.qemu.org" title="link to QEMU homepage">QEMU</a> which involves emulating the very
lowest level instructions a computer can do: the simple arithmetic and comparison of numbers that all code is eventually
expressed as.</p>
<h1>Magic numbers and computing them</h1>
<p>I claim no particular expertise on machine learning so expect this to be a very superficial explanation of whats going
on.</p>
<p>In normal code the CPU tends to execute a lot of different instruction sequences as a program runs through solving the
problem you have set it. The code that calculates where to draw your window will be different to the code checking the
network for new data, or the logic that stores information safely on your file system. Each of those tasks is decomposed
and abstracted into simpler and simpler steps until eventually it is simple arithmetic dictating what the processor
should do do next. You occasionally see hot spots where a particular sequence of instructions are doing a lot of heavy
lifting. There is a whole discipline devoted to managing computational complexity and ensuring algorithms are as
efficient as possible.</p>
<p>However the various technologies that are currently wowing the world work very differently. They are models of various
networks represented by a series of magic numbers or "weights" arranged in a hierarchical structure of interconnected
<a href="https://en.wikipedia.org/wiki/Matrix_(mathematics)">matrices</a>. While there is a lot of nuance to how problems are
encoded and fed into these models fundamentally the core piece of computation is multiplying a bunch of numbers with
another bunch of numbers feeding their results into the next layer of the network. At the end of the process the model
spits out a prediction of the most likely next word is going to be. After selecting one the cycle repeats taking to
account our expanded context to predict the most likely next word.</p>
<p>The "models" that drive these things are described mostly by the number of parameters they have. This encompasses the
number of inputs and outputs they have and the number of numbers in between. For example common small open source models
start at 3 billion parameters with 7, 13 and 34 billion also being popular sizes. Beyond that it starts getting hard to
run models locally on all but the most tricked out desktop PCs. As a developer my desktop is pretty beefy (32 cores,
64Gb RAM) and can chew through computationally expensive builds pretty easily. However as I can't off-load processing
onto my GPU a decent sized model will chug out a few words a second while maxing out my CPU. The ChatGPT v4 model is
speculated to run about 1.7 trillion parameters which needs to be run on expensive cloud hardware - I certainly don't
envy <a href="https://openai.com/">OpenAI</a> their infrastructure bill.</p>
<p>Of course the computational power needed to run these models is a mere fraction of what it took to train them. In fact
the bandwidth and processing requirements are so large it pays to develop custom silicon that is really good at
multiplying large amounts of numbers and not much else. You can get a lot more bang for your buck compared to running
those calculations on a general purpose CPU designed for tackling a wide range of computation problems.</p>
<h1>The Value of Numbers</h1>
<p>Because of the massive investment in synthesising these magic numbers they themselves become worth something. The "magic
sauce" behind a model is more about how it was trained and what data was used to do it. We already know its possible to
encode societies biases into models due to sloppy selection of the input data. One of the principle criticisms of
proprietary generative models is how opaque the training methods are making it hard to judge their safety. The degree to
which models may regurgitate data without any transformation is hard to quantify when you don't know what went into it.</p>
<p>As I'm fundamentally more interested in knowing how the technology I use works under the hood its fortunate there is a
growing open source community working on building their own models. Credit should be given to Meta who made their
language model <a href="https://ai.meta.com/llama/" title="link to Meta's Llama page">LLaMA 2</a> freely available on fairly permissive
terms. Since then there has been an explosion of open source projects that can run the models (e.g:
<a href="https://github.com/ggerganov/llama.cpp" title="link to the llama.cpp project for running CPU bound models">llama.cpp</a>,
<a href="https://ollama.ai/" title="link to Ollama, another tool for locally running models">Ollama</a>) and provide front-ends (e.g:
<a href="https://github.com/oobabooga/text-generation-webui" title="link to Oobabooga's text generation UI">Oobabooga's text generation UI</a>, <a href="https://github.com/s-kostyaev/ellama" title="link to the Ellama github page">Ellama front-end for Emacs</a>) for them.</p>
<h1>Smaller Magic Numbers</h1>
<p>The principle place where this work is going on is <a href="https://huggingface.co/" title="link to the Hugging Face website">Hugging Face</a>. Think of it as the <a href="https://github.com" title="GitHub">GitHub</a> of the machine learning community. It provides an
environment for publishing and collaborating on data sets and models as well hosting and testing their effectiveness in
various benchmarks. This make experimenting with models accessible to developers who aren't part of the well funded
research divisions of the various tech titans. Datasets for example come with
<a href="https://huggingface.co/docs/hub/datasets-cards">cards</a> which describe the sources that went into these multi-terabyte
files.</p>
<p>One example of a such is the <a href="https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T" title="link to the RedPajama dataset">RedPajama dataset</a>. This is an open source initiative to recreate the LLaMA training data which combines
data from the open web and well as numerous permissively licensed source such as Wikipedia, GitHub, StackExchange and
ArXiv. This dataset has been used to train models like <a href="https://huggingface.co/openlm-research" title="link to OpenLM research Hugging Face pages">OpenLLaMA</a> in an attempt to provide an unencumbered version of Meta's LLaMA 2. However
training up these foundational models is an expensive and time consuming task, the real action is taking these models
and then fine tuning them for particular tasks.</p>
<p>To fine tune a model you first take a general purpose model and further train it against data with a specific task in
mind. The purpose of this is not only to make your new model better suited for a particular task but also to optimise
the number of calculations that model has to do to achieve acceptable results. This is also where the style of prompting
will be set as you feed the model examples of the sort of questions and answers you want it to give.</p>
<p>The are further stages that be applied including "alignment" where you ensure results are broadly in tune with the
values of the organisation. This is the reason the various chatbots around won't readily cough up the recipe to build
nukes or make it easier to explicitly break the law. This can be augmented with Reinforcement Learning through Human
Feedback (RHLF) which is practically the purpose of every <a href="https://en.wikipedia.org/wiki/CAPTCHA">CAPTCHA</a> you'll have
filled in over the last 25 years online.</p>
<p>Finally the model can be quantised to make it more manageable. This takes advantage of the fact that a lot of the
numbers will be have a negligible effect on the result for a wide range of inputs. In those cases there is no point
storing them at full precision. As computation is a function of the number of bits of information being processed this
also reduces the cost of computation. While phones and other devices are increasingly including dedicated hardware to
process these models they are still constrained by physics - and the more you process the more heat you need to
dissipate, the more battery you use and the more bandwidth you consume. Obviously the more aggressively you quantise the
models the worse it will perform so there is an engineering trade off to make. Phones work best with multiple highly
tuned models solving specific tasks as efficiently as possible. Fully flexible models giving a
<a href="https://en.wikipedia.org/wiki/J.A.R.V.I.S.">J.A.R.V.I.S</a> like experience will probably always need to run in the cloud
where thermal management is simply an exercise in plumbing.</p>
<h1>Making magic numbers work for you</h1>
<p>Before we discuss using models I want to discuss 3 more concepts: "prompts", "context" and "hallucinations".</p>
<p>The prompt is the closest thing there is to "programming" the model. The prompt can be purely explicit or include other
inputs behind the scenes. For example the prompt can instruct the model to be friendly or terse, decorate code snippets
with markdown, make changes as diffs or in full functions. Generally the more explicit your prompt is about what you
want the better the result you get from the model. <a href="https://en.wikipedia.org/wiki/Prompt_engineering">Prompt
engineering</a> has the potential to be one of those newly created job
titles that will have to replace the jobs obsoleted by advancing AI. One of the ways to embed AI APIs into your app is
to create a task specific prompt that will be put in front of user input that guides the results to what you want.</p>
<p>The "context" is the rest of the input into the model. That could be the current conversation in a chat or the current
page of source code in a code editor. The larger the context the more reference the model has for its answer although
that does come at the cost of even more composition as the context makes for more input parameters into the model.</p>
<p>In a strong candidate for 2023's word of the year "hallucination" describes the quirky and sometime unsettling behaviour
of models outputting weird sometimes contradictory information. They will sincerely and confidently answer questions
with blatant lies or start <a href="https://www.theregister.com/2023/12/01/chatgpt_poetry_ai/">regurgitating training data</a> when
given certain prompts. It is a salient reminder that the statistical nature of these generative models will mean they
occasionally spout complete rubbish. They are also very prone to following the lead of their users - the longer you chat
with a model the more likely it is to end up agreeing with you.</p>
<p>So lets talk about what these models can and can't do. As a developer one of the areas I'm most interested in is their
ability to write code. Systems code especially is an exercise in precisely instructing a computer what to do in explicit
situations. I'd confidently predicted my job would be one of the last to succumb to the advance of AI as systems aren't
something you can get "mostly" right. It was quite a shock when I first saw quite how sophisticated the generated code
can be.</p>
<h2>Code Review</h2>
<p>One of the first things I asked ChatGPT to do was review a function I'd written. It manged to make 6 observations about
the code, 3 of which where actual logic problems I'd missed and 3 where general points about variable naming and
comments. The prompt is pretty important though. If not constrained to point out actual problems LLMs tend to have a
tendency to spit out rather generic advice about writing clean well commented code.</p>
<p>They can be super useful when working with an unfamiliar language or framework. If you are having trouble getting
something to work it might be faster to ask an LLM how to fix your function that spending time reading multiple
<a href="https://stackoverflow.com/">StackOverflow</a> answers to figure out what you've misunderstood. If compiler errors are
confusing supplying the message alongside the code can often be helpful in understanding whats going on.</p>
<h2>Writing Code</h2>
<p>However rather than just suggesting changes one very tempting use case is writing code from scratch based on a
description of what you want. Here the context is very important, the more detail you provide the better chance of
generating something useful. My experience has been that the solutions are usually fairly rudimentary and can often
benefit from a manual polishing step once you have something working.</p>
<p>For my <a href="https://www.bennee.com/~alex/presentations/kvm23-qemu-keynote.html">QEMU KVM Forum 2023 Keynote</a> I got ChatGPT
to write the first draft of a number of my data processing scripts. However it missed obvious optimisations by
repeatedly reading values inside inner loops that made the scripts slower than they needed to be.</p>
<p>If the task is a straight transformation they are very good. Ask an LLM to convert a function in one language into
another and it will do a pretty good job - and probably with less mistakes than your first attempt. However there are
limitations. For example I asked a model to convert some Aarch64 assembler into the equivalent 32 bit Arm assembler. It
did a very good job of the mechanical part of that but missed the subtle differences in how to setup the MMU. This
resulted in code which compiled but didn't work until debugged by a human who was paying close attention to the
architecture documentation as they went.</p>
<p>One of the jobs LLM's are very well suited for is writing code that matches an existing template. For example if you are
mechanically transforming a bunch of enums into a function to convert them to strings you need only do a few examples
before there is enough context for the LLM to reliably figure out what you are doing. LLM's are a lot more powerful than
a simple template expansion because you don't need to explicitly define a template first. The same is true of tasks like
generating test fixtures for your code.</p>
<p>There is a potential trap however with using LLMs to write code. As there is no source code and the proprietary models
are fairly cagey about exactly what data the models where trained on there are worries about them committing copyright
infringement. There are active debates ongoing in the open source community (e.g. <a href="https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg05007.html" title="link to archive of discussion about LLM code generation">on
qemu-devel</a>) about the potential ramifications of a model regurgitating its training data. Without clarity on what
license that data has there is a risk of contaminating projects with code of an unknown province. While I'm sure these
issues will be resolved in time it's certainly a problem you need to be cognisant off.</p>
<h2>Prose</h2>
<p>Writing prose is a much more natural problem territory for LLM's and an area where low-effort text generation will be
rapidly replaced by generative models like ChatGPT. "My" previous <a href="https://www.bennee.com/~alex/blog/2023/10/22/comparing-forge-based-and-email-based-workflow-for-open-source-projects/#comparing-forge-based-and-email-based-workflow-for-open-source-projects">blog
post</a>
was mostly written by a ChatGPT based on a simple brief and a few requests for rewrites in a chat session. While it made
the process fairly quick the result comes across as a little bland and "off". I find there is a tendency for LLM's to
fall back on fairly obvious generalisations and erase any unique authorial voice there may have been.</p>
<p>However if you give enough structure its very easy to get an LLM to expand on a bullet list into more flowery prose.
They are more powerful when being fed a large piece of text and asked to summarise key information in a more accessible
way.</p>
<p>They are certainly an easy way to give a first pass review of your writing although I try to re-phrase things myself
rather than accept suggestions verbatim to keep my voice coming through the text.</p>
<h1>Final Thoughts</h1>
<p>The recent advances in LLM's and the public's exposure to popular tools like ChatGPT have certainly propelled the topic
of AI in the zeitgeist. While we are almost certainly approaching the "Peak of Inflated Expectations" stage of the <a href="https://en.wikipedia.org/wiki/Gartner_hype_cycle">hype
cycle</a> they will undoubtedly be an important step on the road to the
eventual goal of <a href="https://en.wikipedia.org/wiki/Artificial_general_intelligence">Artificial General Intelligence (AGI)</a>.
We are still a long way from being able to ask computers to solve complex problems they way they can in for example in
Star Trek. However in their current form they will certainly have a big impact on the way we work over the next decade
or so.</p>
<p>It's important as a society we learn about how they are built, what their limitations are and understand the
computational cost and resultant impact on the environment. It will be awhile before I'd want to trust a set of magic
numbers over a carefully developed algorithm to actuate the control surfaces on a plane I'm flying on. However they are
already well placed to help us learn new information through interactive questioning and summarising random information
on the internet. We must learn to recognise when we've gone down hallucinatory rabbit hole and verify what we've learned
with reference to trusted sources.</p>Workbooks for Benchmarking2018-02-21T20:34:00+00:002018-02-21T20:34:00+00:00alextag:www.bennee.com,2018-02-21:/~alex/blog/2018/02/21/workbooks-for-benchmarking/<p>While working on a major re-factor of <a class="reference external" href="http://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01330.html">QEMU's softfloat code</a> I've been doing a lot of benchmarking. It can be quite tedious work as you need to be careful you've run the correct steps on the correct binaries and keeping notes is important. It is a task that cries out …</p><p>While working on a major re-factor of <a class="reference external" href="http://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01330.html">QEMU's softfloat code</a> I've been doing a lot of benchmarking. It can be quite tedious work as you need to be careful you've run the correct steps on the correct binaries and keeping notes is important. It is a task that cries out for scripting but that in itself can be a compromise as you end up stitching a pipeline of commands together in something like perl. You may script it all in a language designed for this sort of thing like R but then find your final upload step is a pain to implement.</p>
<p>One solution to this is to use a literate programming workbook like <a class="reference external" href="https://github.com/stsquad/testcases/blob/master/aarch64/benchmark.org">this</a>. Literate programming is a style where you interleave your code with natural prose describing the steps you go through. This is different from simply having well commented code in a source tree. For one thing you do not have to leap around a large code base as everything you need is on the file you are reading, from top to bottom. There are many solutions out there including <a class="reference external" href="https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks">various python based examples</a>. Of course being a happy Emacs user I use one of its stand-out features <a class="reference external" href="https://en.wikipedia.org/wiki/Org-mode">org-mode</a> which comes with multi-language <a class="reference external" href="https://orgmode.org/worg/org-contrib/babel/">org-babel</a> support. This allows me to document my benchmarking while scripting up the steps in a variety of "languages" depending on the my needs at the time. Let's take a look at the first section:</p>
<blockquote>
<p class="rubric" id="orgb7da4a0">1 Binaries To Test</p>
<div class="outline-text-2" id="text-1"><div class="line-block">
<div class="line">Here we have several tables of binaries to test. We refer to the</div>
<div class="line">current benchmarking set from the next stage, Run Benchmark.</div>
</div>
<div class="line-block">
<div class="line">For a final test we might compare the system QEMU with a reference</div>
<div class="line">build as well as our current build.</div>
</div>
<table border="1" class="docutils">
<colgroup>
<col width="81%"/>
<col width="19%"/>
</colgroup>
<thead valign="bottom">
<tr><th class="head">Binary</th>
<th class="head">title</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>/usr/bin/qemu-aarch64</td>
<td>system-2.5.log</td>
</tr>
<tr><td>~/lsrc/qemu/qemu-builddirs/arm-targets.build/aarch64-linux-user/qemu-aarch64</td>
<td>master.log</td>
</tr>
<tr><td>~/lsrc/qemu/qemu.git/aarch64-linux-user/qemu-aarch64</td>
<td>softfloat-v4.log</td>
</tr>
</tbody>
</table>
</div></blockquote><p>Well that is certainly fairly explanatory. These are named org-mode tables which can be referred to in other code snippets and passed in as variables. So the next job is to run the benchmark itself:</p>
<blockquote>
<p class="rubric" id="org5a36bd2">2 Run Benchmark</p>
<div class="outline-text-2" id="text-2"><p>This runs the benchmark against each binary we have selected above.</p>
<pre class="literal-block">
import subprocess
import os
runs=[]
for qemu,logname in files:
cmd="taskset -c 0 %s ./vector-benchmark -n %s | tee %s" % (qemu, tests, logname)
subprocess.call(cmd, shell=True)
runs.append(logname)
return runs
</pre>
</div></blockquote>
<p>So why use python as the test runner? Well truth is whenever I end up munging arrays in shell script I forget the syntax and end up jumping through all sorts of hoops. Easier just to have some simple python. I use python again later to read the data back into an org-table so I can pass it to the next step, graphing:</p>
<blockquote>
<pre class="literal-block">
set title "Vector Benchmark Results (lower is better)"
set style data histograms
set style fill solid 1.0 border lt -1
set xtics rotate by 90 right
set yrange [:]
set xlabel noenhanced
set ylabel "nsecs/Kop" noenhanced
set xtics noenhanced
set ytics noenhanced
set boxwidth 1
set xtics format ""
set xtics scale 0
set grid ytics
set term pngcairo size 1200,500
plot for [i=2:5] data using i:xtic(1) title columnhead
</pre>
</blockquote>
<p>This is a <a class="reference external" href="https://en.wikipedia.org/wiki/Gnuplot">GNU Plot</a> script which takes the data and plots an image from it. org-mode takes care of the details of marshalling the table data into GNU Plot so all this script is really concerned with is setting styles and titles. The language is capable of some fairly advanced stuff but I could always pre-process the data with something else if I needed to.</p>
<p>Finally I need to upload my graph to an image hosting service to share with my colleges. This can be done with a elaborate curl command but I have another trick at my disposal thanks to the excellent <a class="reference external" href="https://github.com/pashky/restclient.el">restclient-mode</a>. This mode is actually designed for interactive debugging of REST APIs but it is also easily to use from an org-mode source block. So the whole thing looks like a HTTP session:</p>
<blockquote>
<pre class="literal-block">
:client_id = feedbeef
# Upload images to imgur
POST https://api.imgur.com/3/image
Authorization: Client-ID :client_id
Content-type: image/png
< benchmark.png
</pre>
</blockquote>
<p>Finally because the above dumps all the headers when run (which is very handy for debugging) I actually only want the URL in most cases. I can do this simply enough in elisp:</p>
<blockquote>
<pre class="literal-block">
#+name: post-to-imgur
#+begin_src emacs-lisp :var json-string=upload-to-imgur()
(when (string-match
(rx "link" (one-or-more (any "\":" whitespace))
(group (one-or-more (not (any "\"")))))
json-string)
(match-string 1 json-string))
#+end_src
</pre>
</blockquote>
<p>The :var line calls the restclient-mode function automatically and passes it the result which it can then extract the final URL from.</p>
<p>And there you have it, my entire benchmarking workflow document in a single file which I can read through tweaking each step as I go. This isn't the first time I've done this sort of thing. As I use org-mode extensively as a logbook to keep track of my upstream work I've slowly grown a series of scripts for common tasks. For example every patch series and pull request I post is done via org. I keep the whole thing in a git repository so each time I finish a sequence I can commit the results into the repository as a permanent record of what steps I ran.</p>
<p>If you want even more inspiration I suggest you look at John Kitchen's <a class="reference external" href="http://kitchingroup.cheme.cmu.edu/scimax">scimax</a> work. As a publishing scientist he makes extensive use of org-mode when writing his papers. He is able to include the main prose with the code to plot the graphs and tables in a single source document from which his camera ready documents are generated. Should he ever need to reproduce any work his exact steps are all there in the source document. Yet another example of why org-mode is awesome ;-)</p>
Running Linux in QEMU's aarch64 system emulation mode2014-05-09T13:14:00+01:002014-05-09T13:14:00+01:00alextag:www.bennee.com,2014-05-09:/~alex/blog/2014/05/09/running-linux-in-qemus-aarch64-system-emulation-mode/<p>Since I started working on <a class="reference external" href="http://translatedcode.wordpress.com/2014/04/24/64-bit-arm-usermode-emulation-in-qemu-2-0-0/">aarch64 support for QEMU</a> the most frequently asked question I got was "when can I run aarch64 system emulation on QEMU?". Well wait no more as support for a VIRT-IO based aarch64 board was recently merged into the master branch of QEMU. In this post …</p><p>Since I started working on <a class="reference external" href="http://translatedcode.wordpress.com/2014/04/24/64-bit-arm-usermode-emulation-in-qemu-2-0-0/">aarch64 support for QEMU</a> the most frequently asked question I got was "when can I run aarch64 system emulation on QEMU?". Well wait no more as support for a VIRT-IO based aarch64 board was recently merged into the master branch of QEMU. In this post I'll talk about building QEMU, a rootfs and a kernel that will allow you to start experimenting with the architecture.</p>
<div class="section" id="quick-start">
<h2>Quick start</h2>
<p>Let's first start with building and running QEMU with some pre-built images.</p>
<div class="section" id="build-dependancies">
<h3>Build Dependancies</h3>
<p>As has been noted in the comments the <em>configure</em> script will automatically enable features as long as the pre-requisite developer libraries are installed on your sytem. With a Debian/Ubuntu system this is easily achieved by running:</p>
<pre class="literal-block">
sudo apt-get build-dep qemu
</pre>
<p>Of course if you want to enable a feature (either a bleeding edge or non-standard) that requires additional libraries then you will need to install the appropriate development packages manually. The <em>config.log</em> file is usually a useful first step in working out what headers are being looked for.</p>
</div>
<div class="section" id="building-qemu">
<h3>Building QEMU</h3>
<pre class="literal-block">
git clone git://git.qemu.org/qemu.git qemu.git
cd qemu.git
./configure --target-list=aarch64-softmmu
make
</pre>
<p>Assuming the build ran without any problems you should now have an executable <em>./aarch64-softmmu/qemu-system-aarch64</em> in your build directory. Grab a pre-built image from <a class="reference external" href="http://people.linaro.org/~alex.bennee/images/aarch64-linux-3.15rc2-buildroot.img">here</a> and we'll check it works. The image is a kernel that has been combined with an initial RAM disk (initrd) with a basic root file-system. I go into more details on how to create this later on.</p>
<p>Be aware the command line is quite long so make sure you copy it all ;-)</p>
<pre class="literal-block">
wget http://people.linaro.org/~alex.bennee/images/aarch64-linux-3.15rc2-buildroot.img
./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel aarch64-linux-3.15rc2-buildroot.img --append "console=ttyAMA0"
</pre>
<p>If all went well you should see the familiar Linux boot sequence and eventually get a login prompt. Login as root (no password) and play in the new sandbox.</p>
<pre class="literal-block">
... usual kernel boot output ...
Welcome to Buildroot
buildroot login: root
# uname -a
Linux buildroot 3.15.0-rc2ajb-00069-g1aae31c #39 SMP Thu Apr 24 11:48:57 BST 2014 aarch64 GNU/Linux
</pre>
<p>Once you are done type <em>C-a c</em> to enter QEMU's monitor mode and then quit to exit.</p>
<pre class="literal-block">
QEMU 2.0.50 monitor - type 'help' for more information
(qemu) quit
</pre>
</div>
</div>
<div class="section" id="accessing-your-local-file-system">
<h2>Accessing your local file-system</h2>
<p>This is all very well but the test image only has a fairly limited root file-system attached to it. It will be a lot more useful if you could access your host file-system to test other binaries. Thanks to VIRT FS we can achieve this without too much hassle. Use the following extended QEMU command line:</p>
<pre class="literal-block">
./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel aarch64-linux-3.15rc2-buildroot.img --append "console=ttyAMA0" -fsdev local,id=r,path=/home/alex/lsrc/qemu/rootfs/trusty-core,security_model=none -device virtio-9p-device,fsdev=r,mount_tag=r
</pre>
<p>This sets up the selected path to be mountable by the guest. In this case I'm using an Ubuntu rootfs which can be downloaded from <a class="reference external" href="http://cdimages.ubuntu.com/ubuntu-core/releases/14.04/release/">here</a>. Once the system has booted the following commands on the guest will mount the local file-system:</p>
<pre class="literal-block">
Welcome to Buildroot
buildroot login: root
# mount -t 9p -o trans=virtio r /mnt
# ls -l /mnt/
total 84
drwxr-xr-x 2 default default 4096 Apr 2 2014 bin
drwxr-xr-x 2 default default 4096 Feb 27 2014 boot
drwxr-xr-x 3 default default 4096 Apr 2 2014 dev
drwxr-xr-x 64 default default 4096 Apr 3 2014 etc
drwxr-xr-x 2 default default 4096 Feb 27 2014 home
..
</pre>
</div>
<div class="section" id="building-your-own-rootfs">
<h2>Building your own rootfs</h2>
<p>There are many solutions to this (including downloading <a class="reference external" href="http://www.linaro.org/downloads/">Linaro engineering builds</a>) but the simplest one I've found for rolling your own from scratch is the <a class="reference external" href="http://buildroot.uclibc.org/">Buildroot project</a>. It present the familiar kernel menuconfig interface and deals with all the hassle of setting up cross compilers for you.</p>
<pre class="literal-block">
git clone git://git.buildroot.net/buildroot buildroot.git
cd buildroot.git
make menuconfig
</pre>
<p>There are lots of configuration options to choose from but the following are what I use:</p>
<div class="line-block">
<div class="line">* Target Options -> Target Architecture(AArch64)</div>
<div class="line">* Toolchain -> Toolchain type (External toolchain)</div>
<div class="line">* Toolchain -> Toolchain (Linaro AArch64 14.02)</div>
<div class="line">* System configuration -> Run a getty (login prompt) after boot (BR2_TARGET_GENERIC_GETTY)</div>
<div class="line">* System configuration -> getty options -> TTY Port (ttyAMA0) (BR2_TARGET_GENERIC_GETTY_PORT)</div>
<div class="line">* Target Packages -> Show packages that are also provided by busybox (BR2_PACKAGE_BUSYBOX_SHOW_OTHERS)</div>
<div class="line">* Filesystem images -> cpio the root filesystem (for use as an initial RAM filesystem) (BR2_TARGET_ROOTFS_CPIO)</div>
</div>
<p>The last one will be important for when we build the kernel next. Once you have configured buildroot to your liking it's time to type make and leave it for a while as you enjoy a nice lunch ;-)</p>
<pre class="literal-block">
make
.. lots of output ..
</pre>
</div>
<div class="section" id="building-a-kernel">
<h2>Building a kernel</h2>
<p>For building the kernel I use my distro's aarch64 cross-compiler. On Debian/Ubuntu systems this is easily added with:</p>
<pre class="literal-block">
$ sudo apt-get install gcc-aarch64-linux-gnu
</pre>
<p>And the usual kernel building process, with a few tweaks for cross compiling:</p>
<pre class="literal-block">
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux.git
cd linux.git
ARCH=arm64 make menuconfig
</pre>
<p>I've put my full config up <a class="reference external" href="http://people.linaro.org/~alex.bennee/images/aarch64-kernel.config">here</a> but important options to note are:</p>
<pre class="literal-block">
CONFIG_CROSS_COMPILE="aarch64-linux-gnu-" # needs to match your cross-compiler prefix
CONFIG_INITRAMFS_SOURCE="/home/alex/lsrc/qemu/buildroot.git/output/images/rootfs.cpio" # points at your buildroot image
CONFIG_NET_9P=y # needed for virtfs mount
CONFIG_NET_9P_VIRTIO=y
</pre>
<p>Finally you build it all with:</p>
<pre class="literal-block">
ARCH=arm64 make -j 8
</pre>
<p>The <em>-j 8</em> just specifies how many parallel build threads to use. Generally set it to the number of cores you have on your machine.</p>
</div>
<div class="section" id="final-test">
<h2>Final test</h2>
<p>All that remains is to test that the newly built kernel works:</p>
<pre class="literal-block">
./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel ../linux.git/arch/arm64/boot/Image --append "console=ttyAMA0"
... lots more output ...
Welcome to Buildroot
ajbtest login: root
[root@ajbtest ~]# ls -l
total 0
[root@ajbtest ~]# uname -a
Linux ajbtest 3.15.0-rc4ajb-00320-gafcf0a2-dirty #41 SMP Fri May 9 13:05:31 BST 2014 aarch64 GNU/Linux
</pre>
<div class="line-block">
<div class="line"><strong>UPDATED:</strong> 27/05/2014</div>
<div class="line">* Added notes about library dependencies</div>
<div class="line">* Cleaned up formatting of shell sections, mention length of command line!</div>
<div class="line">* Fix some spelling errors</div>
</div>
</div>