Alex's Adventures on the Infobahn - qemu

A Systems Programmer's Perspectives on Generative AI

2023-12-10T19:28:00+00:00

Like many people over the last few months I've been playing with a number of Large Language Models (LLMs). LLMs are perhaps best typified by the current media star ChatGPT. It is hard to avoid the current media buzz while every tech titan is developing their "AI" play and people are exposed to tools where the label of Artificial Intelligence is liberally applied. The ability of these models to spit out competent comprehensible text is seemingly a step change in ability compared to previous generations of tech.

I thought I would try and collect some of my thoughts and perspectives on this from the point of view of a systems programmer. For those not familiar with the term is refers to the low level development of providing platforms for the applications people actually use. In my case a lot of the work I do on QEMU which involves emulating the very lowest level instructions a computer can do: the simple arithmetic and comparison of numbers that all code is eventually expressed as.

Magic numbers and computing them

I claim no particular expertise on machine learning so expect this to be a very superficial explanation of whats going on.

In normal code the CPU tends to execute a lot of different instruction sequences as a program runs through solving the problem you have set it. The code that calculates where to draw your window will be different to the code checking the network for new data, or the logic that stores information safely on your file system. Each of those tasks is decomposed and abstracted into simpler and simpler steps until eventually it is simple arithmetic dictating what the processor should do do next. You occasionally see hot spots where a particular sequence of instructions are doing a lot of heavy lifting. There is a whole discipline devoted to managing computational complexity and ensuring algorithms are as efficient as possible.

However the various technologies that are currently wowing the world work very differently. They are models of various networks represented by a series of magic numbers or "weights" arranged in a hierarchical structure of interconnected matrices. While there is a lot of nuance to how problems are encoded and fed into these models fundamentally the core piece of computation is multiplying a bunch of numbers with another bunch of numbers feeding their results into the next layer of the network. At the end of the process the model spits out a prediction of the most likely next word is going to be. After selecting one the cycle repeats taking to account our expanded context to predict the most likely next word.

The "models" that drive these things are described mostly by the number of parameters they have. This encompasses the number of inputs and outputs they have and the number of numbers in between. For example common small open source models start at 3 billion parameters with 7, 13 and 34 billion also being popular sizes. Beyond that it starts getting hard to run models locally on all but the most tricked out desktop PCs. As a developer my desktop is pretty beefy (32 cores, 64Gb RAM) and can chew through computationally expensive builds pretty easily. However as I can't off-load processing onto my GPU a decent sized model will chug out a few words a second while maxing out my CPU. The ChatGPT v4 model is speculated to run about 1.7 trillion parameters which needs to be run on expensive cloud hardware - I certainly don't envy OpenAI their infrastructure bill.

Of course the computational power needed to run these models is a mere fraction of what it took to train them. In fact the bandwidth and processing requirements are so large it pays to develop custom silicon that is really good at multiplying large amounts of numbers and not much else. You can get a lot more bang for your buck compared to running those calculations on a general purpose CPU designed for tackling a wide range of computation problems.

The Value of Numbers

Because of the massive investment in synthesising these magic numbers they themselves become worth something. The "magic sauce" behind a model is more about how it was trained and what data was used to do it. We already know its possible to encode societies biases into models due to sloppy selection of the input data. One of the principle criticisms of proprietary generative models is how opaque the training methods are making it hard to judge their safety. The degree to which models may regurgitate data without any transformation is hard to quantify when you don't know what went into it.

As I'm fundamentally more interested in knowing how the technology I use works under the hood its fortunate there is a growing open source community working on building their own models. Credit should be given to Meta who made their language model LLaMA 2 freely available on fairly permissive terms. Since then there has been an explosion of open source projects that can run the models (e.g: llama.cpp, Ollama) and provide front-ends (e.g: Oobabooga's text generation UI, Ellama front-end for Emacs) for them.

Smaller Magic Numbers

The principle place where this work is going on is Hugging Face. Think of it as the GitHub of the machine learning community. It provides an environment for publishing and collaborating on data sets and models as well hosting and testing their effectiveness in various benchmarks. This make experimenting with models accessible to developers who aren't part of the well funded research divisions of the various tech titans. Datasets for example come with cards which describe the sources that went into these multi-terabyte files.

One example of a such is the RedPajama dataset. This is an open source initiative to recreate the LLaMA training data which combines data from the open web and well as numerous permissively licensed source such as Wikipedia, GitHub, StackExchange and ArXiv. This dataset has been used to train models like OpenLLaMA in an attempt to provide an unencumbered version of Meta's LLaMA 2. However training up these foundational models is an expensive and time consuming task, the real action is taking these models and then fine tuning them for particular tasks.

To fine tune a model you first take a general purpose model and further train it against data with a specific task in mind. The purpose of this is not only to make your new model better suited for a particular task but also to optimise the number of calculations that model has to do to achieve acceptable results. This is also where the style of prompting will be set as you feed the model examples of the sort of questions and answers you want it to give.

The are further stages that be applied including "alignment" where you ensure results are broadly in tune with the values of the organisation. This is the reason the various chatbots around won't readily cough up the recipe to build nukes or make it easier to explicitly break the law. This can be augmented with Reinforcement Learning through Human Feedback (RHLF) which is practically the purpose of every CAPTCHA you'll have filled in over the last 25 years online.

Finally the model can be quantised to make it more manageable. This takes advantage of the fact that a lot of the numbers will be have a negligible effect on the result for a wide range of inputs. In those cases there is no point storing them at full precision. As computation is a function of the number of bits of information being processed this also reduces the cost of computation. While phones and other devices are increasingly including dedicated hardware to process these models they are still constrained by physics - and the more you process the more heat you need to dissipate, the more battery you use and the more bandwidth you consume. Obviously the more aggressively you quantise the models the worse it will perform so there is an engineering trade off to make. Phones work best with multiple highly tuned models solving specific tasks as efficiently as possible. Fully flexible models giving a J.A.R.V.I.S like experience will probably always need to run in the cloud where thermal management is simply an exercise in plumbing.

Making magic numbers work for you

Before we discuss using models I want to discuss 3 more concepts: "prompts", "context" and "hallucinations".

The prompt is the closest thing there is to "programming" the model. The prompt can be purely explicit or include other inputs behind the scenes. For example the prompt can instruct the model to be friendly or terse, decorate code snippets with markdown, make changes as diffs or in full functions. Generally the more explicit your prompt is about what you want the better the result you get from the model. Prompt engineering has the potential to be one of those newly created job titles that will have to replace the jobs obsoleted by advancing AI. One of the ways to embed AI APIs into your app is to create a task specific prompt that will be put in front of user input that guides the results to what you want.

The "context" is the rest of the input into the model. That could be the current conversation in a chat or the current page of source code in a code editor. The larger the context the more reference the model has for its answer although that does come at the cost of even more composition as the context makes for more input parameters into the model.

In a strong candidate for 2023's word of the year "hallucination" describes the quirky and sometime unsettling behaviour of models outputting weird sometimes contradictory information. They will sincerely and confidently answer questions with blatant lies or start regurgitating training data when given certain prompts. It is a salient reminder that the statistical nature of these generative models will mean they occasionally spout complete rubbish. They are also very prone to following the lead of their users - the longer you chat with a model the more likely it is to end up agreeing with you.

So lets talk about what these models can and can't do. As a developer one of the areas I'm most interested in is their ability to write code. Systems code especially is an exercise in precisely instructing a computer what to do in explicit situations. I'd confidently predicted my job would be one of the last to succumb to the advance of AI as systems aren't something you can get "mostly" right. It was quite a shock when I first saw quite how sophisticated the generated code can be.

Code Review

One of the first things I asked ChatGPT to do was review a function I'd written. It manged to make 6 observations about the code, 3 of which where actual logic problems I'd missed and 3 where general points about variable naming and comments. The prompt is pretty important though. If not constrained to point out actual problems LLMs tend to have a tendency to spit out rather generic advice about writing clean well commented code.

They can be super useful when working with an unfamiliar language or framework. If you are having trouble getting something to work it might be faster to ask an LLM how to fix your function that spending time reading multiple StackOverflow answers to figure out what you've misunderstood. If compiler errors are confusing supplying the message alongside the code can often be helpful in understanding whats going on.

Writing Code

However rather than just suggesting changes one very tempting use case is writing code from scratch based on a description of what you want. Here the context is very important, the more detail you provide the better chance of generating something useful. My experience has been that the solutions are usually fairly rudimentary and can often benefit from a manual polishing step once you have something working.

For my QEMU KVM Forum 2023 Keynote I got ChatGPT to write the first draft of a number of my data processing scripts. However it missed obvious optimisations by repeatedly reading values inside inner loops that made the scripts slower than they needed to be.

If the task is a straight transformation they are very good. Ask an LLM to convert a function in one language into another and it will do a pretty good job - and probably with less mistakes than your first attempt. However there are limitations. For example I asked a model to convert some Aarch64 assembler into the equivalent 32 bit Arm assembler. It did a very good job of the mechanical part of that but missed the subtle differences in how to setup the MMU. This resulted in code which compiled but didn't work until debugged by a human who was paying close attention to the architecture documentation as they went.

One of the jobs LLM's are very well suited for is writing code that matches an existing template. For example if you are mechanically transforming a bunch of enums into a function to convert them to strings you need only do a few examples before there is enough context for the LLM to reliably figure out what you are doing. LLM's are a lot more powerful than a simple template expansion because you don't need to explicitly define a template first. The same is true of tasks like generating test fixtures for your code.

There is a potential trap however with using LLMs to write code. As there is no source code and the proprietary models are fairly cagey about exactly what data the models where trained on there are worries about them committing copyright infringement. There are active debates ongoing in the open source community (e.g. on qemu-devel) about the potential ramifications of a model regurgitating its training data. Without clarity on what license that data has there is a risk of contaminating projects with code of an unknown province. While I'm sure these issues will be resolved in time it's certainly a problem you need to be cognisant off.

Prose

Writing prose is a much more natural problem territory for LLM's and an area where low-effort text generation will be rapidly replaced by generative models like ChatGPT. "My" previous blog post was mostly written by a ChatGPT based on a simple brief and a few requests for rewrites in a chat session. While it made the process fairly quick the result comes across as a little bland and "off". I find there is a tendency for LLM's to fall back on fairly obvious generalisations and erase any unique authorial voice there may have been.

However if you give enough structure its very easy to get an LLM to expand on a bullet list into more flowery prose. They are more powerful when being fed a large piece of text and asked to summarise key information in a more accessible way.

They are certainly an easy way to give a first pass review of your writing although I try to re-phrase things myself rather than accept suggestions verbatim to keep my voice coming through the text.

Final Thoughts

The recent advances in LLM's and the public's exposure to popular tools like ChatGPT have certainly propelled the topic of AI in the zeitgeist. While we are almost certainly approaching the "Peak of Inflated Expectations" stage of the hype cycle they will undoubtedly be an important step on the road to the eventual goal of Artificial General Intelligence (AGI). We are still a long way from being able to ask computers to solve complex problems they way they can in for example in Star Trek. However in their current form they will certainly have a big impact on the way we work over the next decade or so.

It's important as a society we learn about how they are built, what their limitations are and understand the computational cost and resultant impact on the environment. It will be awhile before I'd want to trust a set of magic numbers over a carefully developed algorithm to actuate the control surfaces on a plane I'm flying on. However they are already well placed to help us learn new information through interactive questioning and summarising random information on the internet. We must learn to recognise when we've gone down hallucinatory rabbit hole and verify what we've learned with reference to trusted sources.

Workbooks for Benchmarking

2018-02-21T20:34:00+00:00

While working on a major re-factor of QEMU's softfloat code I've been doing a lot of benchmarking. It can be quite tedious work as you need to be careful you've run the correct steps on the correct binaries and keeping notes is important. It is a task that cries out for scripting but that in itself can be a compromise as you end up stitching a pipeline of commands together in something like perl. You may script it all in a language designed for this sort of thing like R but then find your final upload step is a pain to implement.

One solution to this is to use a literate programming workbook like this. Literate programming is a style where you interleave your code with natural prose describing the steps you go through. This is different from simply having well commented code in a source tree. For one thing you do not have to leap around a large code base as everything you need is on the file you are reading, from top to bottom. There are many solutions out there including various python based examples. Of course being a happy Emacs user I use one of its stand-out features org-mode which comes with multi-language org-babel support. This allows me to document my benchmarking while scripting up the steps in a variety of "languages" depending on the my needs at the time. Let's take a look at the first section:

1 Binaries To Test

Here we have several tables of binaries to test. We refer to the

current benchmarking set from the next stage, Run Benchmark.

For a final test we might compare the system QEMU with a reference

build as well as our current build.

Binary title

/usr/bin/qemu-aarch64 system-2.5.log

~/lsrc/qemu/qemu-builddirs/arm-targets.build/aarch64-linux-user/qemu-aarch64 master.log

~/lsrc/qemu/qemu.git/aarch64-linux-user/qemu-aarch64 softfloat-v4.log

Binary	title
/usr/bin/qemu-aarch64	system-2.5.log
~/lsrc/qemu/qemu-builddirs/arm-targets.build/aarch64-linux-user/qemu-aarch64	master.log
~/lsrc/qemu/qemu.git/aarch64-linux-user/qemu-aarch64	softfloat-v4.log

Well that is certainly fairly explanatory. These are named org-mode tables which can be referred to in other code snippets and passed in as variables. So the next job is to run the benchmark itself:

2 Run Benchmark

This runs the benchmark against each binary we have selected above.

import subprocess
import os

runs=[]

for qemu,logname in files:
cmd="taskset -c 0 %s ./vector-benchmark -n %s | tee %s" % (qemu, tests, logname)
    subprocess.call(cmd, shell=True)
    runs.append(logname)

    return runs

So why use python as the test runner? Well truth is whenever I end up munging arrays in shell script I forget the syntax and end up jumping through all sorts of hoops. Easier just to have some simple python. I use python again later to read the data back into an org-table so I can pass it to the next step, graphing:

set title "Vector Benchmark Results (lower is better)"
set style data histograms
set style fill solid 1.0 border lt -1

set xtics rotate by 90 right
set yrange [:]
set xlabel noenhanced
set ylabel "nsecs/Kop" noenhanced
set xtics noenhanced
set ytics noenhanced
set boxwidth 1
set xtics format ""
set xtics scale 0
set grid ytics
set term pngcairo size 1200,500

plot for [i=2:5] data using i:xtic(1) title columnhead

This is a GNU Plot script which takes the data and plots an image from it. org-mode takes care of the details of marshalling the table data into GNU Plot so all this script is really concerned with is setting styles and titles. The language is capable of some fairly advanced stuff but I could always pre-process the data with something else if I needed to.

Finally I need to upload my graph to an image hosting service to share with my colleges. This can be done with a elaborate curl command but I have another trick at my disposal thanks to the excellent restclient-mode. This mode is actually designed for interactive debugging of REST APIs but it is also easily to use from an org-mode source block. So the whole thing looks like a HTTP session:

:client_id = feedbeef

# Upload images to imgur
POST https://api.imgur.com/3/image
Authorization: Client-ID :client_id
Content-type: image/png

< benchmark.png

Finally because the above dumps all the headers when run (which is very handy for debugging) I actually only want the URL in most cases. I can do this simply enough in elisp:

#+name: post-to-imgur
#+begin_src emacs-lisp :var json-string=upload-to-imgur()
  (when (string-match
         (rx "link" (one-or-more (any "\":" whitespace))
             (group (one-or-more (not (any "\"")))))
         json-string)
    (match-string 1 json-string))
#+end_src

The :var line calls the restclient-mode function automatically and passes it the result which it can then extract the final URL from.

And there you have it, my entire benchmarking workflow document in a single file which I can read through tweaking each step as I go. This isn't the first time I've done this sort of thing. As I use org-mode extensively as a logbook to keep track of my upstream work I've slowly grown a series of scripts for common tasks. For example every patch series and pull request I post is done via org. I keep the whole thing in a git repository so each time I finish a sequence I can commit the results into the repository as a permanent record of what steps I ran.

If you want even more inspiration I suggest you look at John Kitchen's scimax work. As a publishing scientist he makes extensive use of org-mode when writing his papers. He is able to include the main prose with the code to plot the graphs and tables in a single source document from which his camera ready documents are generated. Should he ever need to reproduce any work his exact steps are all there in the source document. Yet another example of why org-mode is awesome ;-)

Running Linux in QEMU's aarch64 system emulation mode

2014-05-09T13:14:00+01:00

Since I started working on aarch64 support for QEMU the most frequently asked question I got was "when can I run aarch64 system emulation on QEMU?". Well wait no more as support for a VIRT-IO based aarch64 board was recently merged into the master branch of QEMU. In this post I'll talk about building QEMU, a rootfs and a kernel that will allow you to start experimenting with the architecture.

Quick start

Let's first start with building and running QEMU with some pre-built images.

Build Dependancies

As has been noted in the comments the configure script will automatically enable features as long as the pre-requisite developer libraries are installed on your sytem. With a Debian/Ubuntu system this is easily achieved by running:

sudo apt-get build-dep qemu

Of course if you want to enable a feature (either a bleeding edge or non-standard) that requires additional libraries then you will need to install the appropriate development packages manually. The config.log file is usually a useful first step in working out what headers are being looked for.

Building QEMU

git clone git://git.qemu.org/qemu.git qemu.git
cd qemu.git
./configure --target-list=aarch64-softmmu
make

Assuming the build ran without any problems you should now have an executable ./aarch64-softmmu/qemu-system-aarch64 in your build directory. Grab a pre-built image from here and we'll check it works. The image is a kernel that has been combined with an initial RAM disk (initrd) with a basic root file-system. I go into more details on how to create this later on.

Be aware the command line is quite long so make sure you copy it all ;-)

wget http://people.linaro.org/~alex.bennee/images/aarch64-linux-3.15rc2-buildroot.img
./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel aarch64-linux-3.15rc2-buildroot.img  --append "console=ttyAMA0"

If all went well you should see the familiar Linux boot sequence and eventually get a login prompt. Login as root (no password) and play in the new sandbox.

... usual kernel boot output ...
Welcome to Buildroot
buildroot login: root
# uname -a
Linux buildroot 3.15.0-rc2ajb-00069-g1aae31c #39 SMP Thu Apr 24 11:48:57 BST 2014 aarch64 GNU/Linux

Once you are done type C-a c to enter QEMU's monitor mode and then quit to exit.

QEMU 2.0.50 monitor - type 'help' for more information
(qemu) quit

Accessing your local file-system

This is all very well but the test image only has a fairly limited root file-system attached to it. It will be a lot more useful if you could access your host file-system to test other binaries. Thanks to VIRT FS we can achieve this without too much hassle. Use the following extended QEMU command line:

./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel aarch64-linux-3.15rc2-buildroot.img --append "console=ttyAMA0" -fsdev local,id=r,path=/home/alex/lsrc/qemu/rootfs/trusty-core,security_model=none -device virtio-9p-device,fsdev=r,mount_tag=r

This sets up the selected path to be mountable by the guest. In this case I'm using an Ubuntu rootfs which can be downloaded from here. Once the system has booted the following commands on the guest will mount the local file-system:

Welcome to Buildroot
buildroot login: root
# mount -t 9p -o trans=virtio r /mnt
# ls -l /mnt/
total 84
drwxr-xr-x    2 default  default       4096 Apr  2  2014 bin
drwxr-xr-x    2 default  default       4096 Feb 27  2014 boot
drwxr-xr-x    3 default  default       4096 Apr  2  2014 dev
drwxr-xr-x   64 default  default       4096 Apr  3  2014 etc
drwxr-xr-x    2 default  default       4096 Feb 27  2014 home
..

Building your own rootfs

There are many solutions to this (including downloading Linaro engineering builds) but the simplest one I've found for rolling your own from scratch is the Buildroot project. It present the familiar kernel menuconfig interface and deals with all the hassle of setting up cross compilers for you.

git clone git://git.buildroot.net/buildroot buildroot.git
cd buildroot.git
make menuconfig

There are lots of configuration options to choose from but the following are what I use:

* Target Options -> Target Architecture(AArch64)
* Toolchain -> Toolchain type (External toolchain)
* Toolchain -> Toolchain (Linaro AArch64 14.02)
* System configuration -> Run a getty (login prompt) after boot (BR2_TARGET_GENERIC_GETTY)
* System configuration -> getty options -> TTY Port (ttyAMA0) (BR2_TARGET_GENERIC_GETTY_PORT)
* Target Packages -> Show packages that are also provided by busybox (BR2_PACKAGE_BUSYBOX_SHOW_OTHERS)
* Filesystem images -> cpio the root filesystem (for use as an initial RAM filesystem) (BR2_TARGET_ROOTFS_CPIO)

The last one will be important for when we build the kernel next. Once you have configured buildroot to your liking it's time to type make and leave it for a while as you enjoy a nice lunch ;-)

make
.. lots of output ..

Building a kernel

For building the kernel I use my distro's aarch64 cross-compiler. On Debian/Ubuntu systems this is easily added with:

$ sudo apt-get install gcc-aarch64-linux-gnu

And the usual kernel building process, with a few tweaks for cross compiling:

git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux.git
cd linux.git
ARCH=arm64 make menuconfig

I've put my full config up here but important options to note are:

CONFIG_CROSS_COMPILE="aarch64-linux-gnu-"                                               # needs to match your cross-compiler prefix
CONFIG_INITRAMFS_SOURCE="/home/alex/lsrc/qemu/buildroot.git/output/images/rootfs.cpio"  # points at your buildroot image
CONFIG_NET_9P=y                                                                         # needed for virtfs mount
CONFIG_NET_9P_VIRTIO=y

Finally you build it all with:

ARCH=arm64 make -j 8

The -j 8 just specifies how many parallel build threads to use. Generally set it to the number of cores you have on your machine.

Final test

All that remains is to test that the newly built kernel works:

./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel ../linux.git/arch/arm64/boot/Image  --append "console=ttyAMA0"
... lots more output ...
Welcome to Buildroot
ajbtest login: root
[root@ajbtest ~]# ls -l
total 0
[root@ajbtest ~]# uname -a
Linux ajbtest 3.15.0-rc4ajb-00320-gafcf0a2-dirty #41 SMP Fri May 9 13:05:31 BST 2014 aarch64 GNU/Linux

UPDATED: 27/05/2014
* Added notes about library dependencies
* Cleaned up formatting of shell sections, mention length of command line!
* Fix some spelling errors