6th November 2023
After my pdb detour, I used the âprototype IDEâ for a while to hack up a little platformer.
Not-even-saving-the-file code updates are definitely the way to go! Itâs very fun when you can just type things and see them pop up immediately after the keystroke that types the last letter.
I quickly discovered a variety of ways to crash the compiler (oops!1), so I spent a while fixing all of those that I encountered. This made the editor/compiler/runtime combo stable enough that I could type random C for an hour or so without crashing everything.
I also added a fuzzer
target
hoping to further improve stability. Itâs just a simple integration for
libFuzzer (aka
-fsanitize=fuzzer
in clang) and it passes the fuzz input directly to
the compiler update function. This found some bugs in the tokenizer and
preprocessor so it was kind of a success.
But unfortunately, even with a helper corpus I only managed to find a couple parser bugs, and no codegen bugs. It is highly improbable that the reason I didnât find more parser and codegen bugs because they donât exist. Rather, the problem is that the fuzzer is unable to generate valid-enough semi-C-shaped inputs to make it to those codepaths.
I did a little hunting around for fuzzers that specialize in fuzzing compilers. It looks like some people have tried, but nothing too great, and I didnât have much success integrating them, so better fuzzing will have to wait for now. (Suggestions?)
This is of course a C compiler, and C compilers compile C code, and C code is eminently capable of writing its own crashes too!
So once the compiler wasnât crashing, my next source of flow-interruption is when I write game code that dereferences null or reads out of bounds or whatever.
I havenât tried to work on this problem yet, but I think that will be the next thing to do in the game shell. It should be possible to add some amount of sandboxing to the gameâs C code so that it runs normally in-process, but can trap (say) access violations at the shell level and report a user error, while maintaining a functioning compiler REPL, so that the error can be repaired without restarting.
When embedding the compiler into another program, libdyibicc.c
and
.h
were required of course, but in addition, you also had to have a
copy of (and point the compiler at) the compilerâs built-in include
directory. This
didnât really spark joy because it gets into file system path
manipulation, non-obvious initial setup steps, etc.
So instead now, all compiler built-in headers (stddef.h
, etc.) are
slapped into libdyibicc.c
during packaging. So if youâre embedding,
all you need is the .c
and .h
. It still uses the system headers (for
example, for libc) so those are still required in the normal places
(%WindowsSdkDir%
, /usr/include
, etc.).
Additional grab bag:
As a particularly lazy fellow, especially when prototyping, Iâm annoyed
and tired every time I have to make some janky linked list or MyObj
objects[MAX_MY_OBJS]
. So in a fit of hubris, I decided to build
containers directly into dyibicc as a language feature.
Now what does it mean to be a language feature vs. just a built-in library? To me, I think it means âsyntaxâ, and thatâs where we veer into this being a questionable idea.
There are a huge number of Absolutely Fine container libraries for C with various tradeoffs (ctl, Klib, mlib, sgc, STC, and hundreds more). I semi-arbitrarily picked STC as being featured-enough, but not too heavy.
As with all C libraries, itâs necessarily a bit âstutterâ-y (especially with more complex types) as you need to repeat the namespace, the type, and the object in each function call.
To try to make this nicer, I added some extensions to dyibicc. The first
is a pretty simple one: __attribute__((methodcall(PREFIX)))
. This is
an attribute that goes on a struct declaration that makes it âsort of
callableâ. So instead of writing:
struct my_vector_type { ... };
my_vector_type my_vec = {0};
my_vector_type_push_back(&my_vec, 123);
my_vector_type_push_back(&my_vec, 456);
my_vector_type_push_back(&my_vec, 789);
you can instead write:
struct __attribute__((methodcall(my_vector_type_))) my_vector_type { ... };
my_vector_type my_vec = {0};
my_vec..push_back(123);
my_vec..push_back(456);
my_vec..push_back(789);
The ..
syntax and methodcall
attribute work together, and if the
static type of the left-hand side has a methodcall attribute, the
right-hand side is rewritten using the PREFIX
and a âselfâ argument.
That is, with methodcall
on the vector type, v..push(1)
is rewritten
to my_vector_type_push(&v, 1)
.
Additionally, ..
follows pointers, so this works as expected, without
needing to use ->
or (*x)
.
int myfunc(my_vector_type* x) {
x..push_back(14);
}
I thought about using .
instead of ..
for this, and while ambiguous
(vs. normal struct field access) I think it could probably work fine. I
wasnât sure if itâd be more or less confusing, and this was slightly
easier to implement for now.
The second simplifying feature was to make the âtemplatingâ part
automatic. In STC, you need to pre-define
various i_*
keys, then
include a specific header which generates the associated structures and
functions and then undefines those keys. So you need to figure out which
types you want to use up-front and predefine them, and think about how
theyâre going to be forward declared between translation units, and so
on.
dyibicc has $vec
and $map
built in now, which deal with this
automatically, so you can write $map(int, char*)
or $vec(int)
and
the correct types will be instantiated and declared (or not) as
necessary: simple
test
(notice no #include
s). Assuming this doesnât feel terrible after using
it for a while, I guess probably a string type and maybe a set type
would be commonly useful too.
There are many good, valid, correct arguments against doing this. Primarily, it just isnât really a C compiler any more, and your code isnât going to work in another C compiler. Or maybe you just think itâs ugly! But what good is compiler power if youâre not going to abuse it! So Iâm going with Embrace, Extend, Extinguish for now. Surely domination of the C compiler market is just around the corner.
Paraphrasing the funniest email signoff Iâve received in a while: There canât possibly be a better way to end a blog post!
Comments or corrections? Feel free to send me an email. Back to the front page.