December Adventure 2023¶

Last year I've participated to the "AWKdvent of Code", my version of the advent of code: with AWK. This year let's try something else :)

The December Adventure!

┌───────────────────────────┐
│      December 2023        │
├───┬───┬───┬───┬───┬───┬───┤
│   │   │   │   │*1 │*2 │*3 │
├───┼───┼───┼───┼───┼───┼───┤
│*4 │*5 │*6 │*7 │*8 │*9 │*10│
├───┼───┼───┼───┼───┼───┼───┤
│*11│*12│*13│*14│ 15│ 16│ 17│
├───┼───┼───┼───┼───┼───┼───┤
│ 18│*19│*20│ 21│ 22│ 23│ 24│
├───┼───┼───┼───┼───┼───┼───┤
│ 25│ 26│ 27│ 28│ 29│ 30│ 31│
└───┴───┴───┴───┴───┴───┴───┘

Goals¶

Stuff I'd like to do:

implement the plan/line interception for collision detection
rework the text layout/display on my (unreleased) text editor
finally do the PCB design to bring USB to a palm portable keyboard (foldable!)
implement texture selection in my live shader coding tool "bonz"
add a web page with a selection of shaders to be displayed on a web-browser

01¶

Created a quick demo page with a openGL shader that can run in the web-browser. I would like to create custom HTML elements, like what Stargirl did with KiCanvas.

02¶

Today was snowing outside, and so I spent the entire day on the computer...

Worked on a demo project, the goal is to create a 4K demo/intro for Linux. The idea is to compress a dynamic elf linked against the libSDL2 and embed all of this into another statically linked tiny elf. The second elf will decompress the first elf in memory and call an exec syscall on it, so the ld interpreter will be used. I hope it make sense.

Cleaned-up a skeleton that uses SDL2 to display a single full screen shader. Also hacked the GL loader "glad" to be more tightly integrated so it will only "load" (resolve) the openGL symbols that are actually used, this saves a lot of bytes.

Then I've started cleaning up the "loader" part, which I already started few month ago after finding a neat trick to execute an elf directly from memory:

int fd = memfd_create("", MFD_CLOEXEC);
execveat(fd, "", argv, envp, AT_EMPTY_PATH);

This trick is very close to the memfd_create() + fexecve() syscall, except this one uses execveat() with AT_EMPTY_PATH which doesn't changes much but it doesn't require to craft an path to /proc/pid/fd/3.

The execveat() man-page suggest that is it used to execute a given file located in a directory specified by a file descritor, which in my case isn't what I need. However the AT_EMPTY_PATH flags change its behavior by executing the given file descriptor when the file path is the empty string ""!

At that time I had a linker script to create a tiny elf solely using PHDRS. This script wasn't on my computer, but on the "old" one sitting next to me... I didn't bother powering this "old" computer again to get the previous script. Instead I tried to do something else this time: creating the elf header directly in C by using the system elf.h header and some symbols defined in a custom linker script.

~~~

Things I've learned.

In the program header the "physical" file size p_filesz must be a smaller or equal to the "virtual" memory size p_memsz, otherwise the elf loader will refuse to load the program and will return a "segfault".

The environment variables are needed by SDL, or by X11, without it fails to connect to the X11 server

The environment variables pointer envp is located at the end of the args.

/* pointer arithmetics */
char **envp = argv + argc + 1;

But then, where does argc and argv comes from ? Turns out it is passed by the kernel on the stack (at least on x86_64):

mov %rsp,%rdi ;; copy the stack pointer to register rdi

The stack should be organized as the following, with argc being on "top" of stack followed by the argv pointer:

┌ $rsp                  ┌ $rsp+8
├──┬──┬──┬──┬──┬──┬──┬──┼──┬──┬──┬──┬──┬──┬──┬──┐
│          argc         │         argv          │
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘

Listening to 8-bit mentality, on a loop...

what a day

Next goal is to look at how to lzma works and how it can be decompressed.

03¶

Yesterday was far more "productive" than expected, today's goal is to keep it cool. The initial plan for this sunday was to go outside to do some moderate mixed alpine climbing, but we had to cancel due to the recent snowfall.

Today's goal is to look into ray-marching and having fun with shaders.

~~~

I have not looked much at lzma, but I've looked at using ray marching in shaders...

I haven't fully understood the usual ray marching implementation details (such as how to create a "camera") but the general idea and algorithm is quite "simple", the catch is how to create complex worlds that I still don't understand.

I think the ray marching approche could be applied create a simple collision systems for games.

Started adding "timeline" controls into my live shader coding tool bonz.

04¶

Worked more on the "timeline" controls.

The time can be stopped and two markers can be placed on the timeline to create loops.

video

05¶

Started tinkering on a simple markdown parser in AWK, I want to be able write my own implementation of notmarkdown, which I already modified quite a bit. I want this version to support having lists in quoted blocks...

06¶

Studied a bit more on how LZMA works, but it is still a mystery.

LZMA stands for Lempel–Ziv–Markov chain Algorithm, and if i understood correctly is it the combination of an Lempel-Ziv based compression (see LZ77) which is then further compressed by a range coder, which is an entropy coding method...

An actual LZMA file starts with the following structure:

1 byte with the compression configuration (a complex combination)
4 bytes for the dictionary size (big-endian encoded ?)
8 bytes for the uncompressed size, or UINT64_MAX (all 0xff) for a stream, the end is noted with a special marker

0  +1         +4                       +8/
├──┼──┬──┬──┬──┼──┬──┬──┬──┬──┬──┬──┬──┼──
│cf│ dico_size │  uncompressed_size    │<stream start>
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──

Then the stream starts with 5 bytes that are used to initialize the range coder, the first byte is always NUL (and doesn't have much impact).

──┼──┬──┬──┬──┬──┼──
..│r0│r1│r2│r3│r4│<...>
──┴──┴──┴──┴──┴──┴──

Code snippet for the range coder initilization, the first iteration has no effect on the final rc->code state which is 32-bits.

for (i = 0; i < 5; i++)
	rc->code = (rc->code << 8) + in->buf[in->pos++];

There is so much more ...

07¶

More study on LZMA. And cleanup of shader initialization in bonz.

Nothing visual.

08¶

I can't remember this far... I should have keept taking note...

09¶

Got a simple LZMA decompression working but the size of the decoding binary is very high (~3.8K bytes).

10¶

Reduce the binary size by re-writting few functions: passing parameters by values, reducing branches, using globals, removing testing for error cases. In the end this saved ~250 bytes, this is not enough.

11¶

Starting exploring other means to gain space on the lzma decompressor. I've got severals ideas:

compressing the lzma decoder with lz4
implemeting a lzma decoder with some form of byte code to achieve a better code reusability

At that point I wanted to put all piece togther and see, at least, if the LZMA decoder works well enough to properly decode the demo... somehow my laptop's kernel refuse to execute the tiny crafted elf and I don't know why. I'll have to take a look at this.

12¶

Fixed the issue with the tiny elf not being loaded by the kernel. Turns out the elf load address matters, despit being a virtual address, maybe this is because of VDSO.

To debug this I've looked at tmpout.sh which has a lot of informations about the elf executable and many links to other ressources.

My previous attempt at creating a tiny LZMA decoder failed and I think this is caused by me attempting to optimise for binary size too early on, before having a working setup. Thus I started all over but with the objective of having a working setup first, only then to look at size optimisation.

Managed to get a single file implemtation of an LZMA decoder. I had to debug along side a working implementation, some functions that I copied from my first attempt were broken.

13¶

Got a LZMA decoder working but the loader failed to execute the binary directly from memory, despit successfully decompressing.

It looked like it was a memory corruption because the debug printfs I added were printing non-sense... But it turns out it was the execveat syscall that simply returned a negative error code (in this case -14 aka -EFAULT) and my implementation of the %d decimal format had a bug when printing negative numbers.

After a lot of investigation it seems to be stack based variables that cause the issue, and so I made every variables static and it "worked". The current total binary size is 4483 bytes, a bit above the target of 4096, the loader itself takes around 1667 bytes, which is a lot.

14¶

Today's goal is to investigate on the stack corruption...

Increasing the stack, by substracting the stack pointer ($rsp on x86_64) "fix" the issue, which confirms that there is a stack overflow or corruption.

After more investigation this was caused by the inline assembly in the custom entry point which had the naked attribute, the main function declared static was inlined into the entry point and somehow inherited the naked attribute. The issue can be simply fixed by removing the static attribute on the main function.

I'll have to see if there is way to make this startup code more compact. But that's for another day.

15¶

I went to Antibes to participate in a regatta for the weekend (16 and 17). We sail two days on a J130, this was the first time I was on such a huge boat (~13m). This was really nice and the weather was really hot for december, there were not much wind but still this was fun.

18¶

On the road back from Antibes.

19¶

Fixed the startup in the demo project, the stack is not initialized three instructions golfed into the elf header and simply fallthrough to the main function. The main function isn't declared static anymore solving the stack initialization issue, and is placed right after the elf header by the linker script.

20¶

Modified the demo to use X11 instead of SDL to open a window an get an openGL context but the SDL version is still smaller than the one using X11, maybe I'll need to look at this more. I hope using SDL is authorised by the demoparty.

23¶

Participated in a local regatta on a lake near-by.

eol¶

Keeping a log is harder that what I thought, I found it difficult to write and I tend to postpone writing until I don't remember what I did...