Seems like a cool project. Since they’re specifically targeting the bootstrap use case, it would have been cool to see some comparisons to the existing GNU full-source bootstrap that’s rooted in hex0. While a fair amount of bootstrapping effort is necessary to get from hex0 to a shell capable of running pnut, it does make me wonder if it could target a simpler subset of POSIX shell for which a simpler shell and bootstrap chain could be developed.
While a fair amount of bootstrapping effort is necessary to get from hex0 to a shell capable of running pnut, it does make me wonder if it could target a simpler subset of POSIX shell for which a simpler shell and bootstrap chain could be developed.
We chose POSIX shell in particular because we think it’s a very good foundation for diverse double-compilation because of its many implementations coming from different sources that are available on almost all platforms. This means we can gain trust in the POSIX shell implementations by simply running pnut on 2 different shells (ideally on 2 independent operating systems and hardware) and comparing the results. If the results are the same, either the 2 shells are compromised in the same way, or, more likely, they are not compromised at all. This allows us to start our fully reproducible build process at a higher abstraction level, saving us from writing machine-specific seeds.
Since they’re specifically targeting the bootstrap use case, it would have been cool to see some comparisons to the existing GNU full-source bootstrap that’s rooted in hex0.
We’re still working on bootstrapping TCC (and then GCC) so we’re not there yet :) We see ourselves more as an alternative foundation on which to bootstrap GCC rather than as a competitor since we plan to reuse part of their bootstrapping path with TCC and GCC-4.7.4. Also, there may be room for pnut in GNU’s full-source bootstrap project since they end up building some basic shells which may be able to execute pnut.
Coincidentally, I was looking through Ribbit Scheme a few months ago by the same authors (which is mentioned in passing in the paper). It’s a very interesting tiny Scheme implementation, with a POSIX shell target (and many others):
to simulate a big array, and then you can write a GC on top of that.
It reminds me of some Lisp-in-awk implementations I looked at – awk has arrays, but not nested arrays with GC.
The C runtime doesn’t need GC of course, only Scheme, but it reminded me of that when I looked through that code
Some limitations:
As of now, pnut is complete enough to bootstrap itself and compile pnut-exe. Work to build TCC with pnut-exe is
ongoing. Once this is achieved, building GCC is easy as it can be done with a known recipe from TCC
No support for floating point numbers and unsigned integers.
goto and switch fallthrough are not supported.
The address of (&) operator on local variables is not supported.
Arrays and structures cannot be stack-allocated or passed by value.
Function pointers and indirect calls.
This subset does seem usable for bootstrapping though!
Also table 3 of pnut running under different shells matches my experience - zsh is really slow for some reason! The rest of the shells are within a factor of 2 or 3 of each other. OSH is also in that general speed category, although it’s getting faster.
Coincidentally, I was looking through Ribbit Scheme a few months ago by the same authors (which is mentioned in passing in the paper). It’s a very interesting tiny Scheme implementation, with a POSIX shell target (and many others):
Pnut was in part motivated by the POSIX shell Ribbit VM implementation. Writting POSIX shell script by hand is hard, and the resulting code isn’t the easiest to read! We’ve recently adapted the C Ribbit VM to be compatible with the subset supported by pnut, allowing us to generate a much clearer shell RVM from the C RVM implementation.
You’ll notice that this new version of the RVM doesn’t use eval and no external utility other than read and printf, which are things we disliked from the original shell RVM as it made it hard to read and not fully portable.
Also table 3 of pnut running under different shells matches my experience - zsh is really slow for some reason! The rest of the shells are within a factor of 2 or 3 of each other.
Our impression is that function calls are slower on zsh than on other shells. The table in section 3.2.2 of the paper shows that using set instead of let/endlet (1 extra function call per local variable) to simulate local variables brings an almost 3x performance improvement on zsh, and less than 2x on all other shells we’ve tested, and inlining of some frequently called functions also shows a larger improvement on zsh. We didn’t want to compromise readability for the performance of a particular shell, but it’s unfortunate that zsh, the default MacOS shell, is this slow.
OSH is also in that general speed category, although it’s getting faster.
We’ve tried to run pnut with OSH but it seemed to not like the mix of negative numbers and comparison operators ([ $((-1)) -le 0 ] produces an Invalid integer constant '-1' error) so we were not able to include it in our benchmarks.
That’s cool that you can now generate a shell RVM from C! I didn’t notice that.
It makes sense not to use eval, although a bit of trivia is that shells with arrays like bash and ksh have hidden evals in arithmetic! I will put this in a second comment
zsh slowness
I also found that the zsh parser is slow, not just the runtime. It’s the slowest shell on this benchmark, and OSH is faster even though it incurs GC overhead for memory safety (unlike any other shell):
Actually, allowing : $(( a_$x = 1 )) but not arbitrary shell is something that we did for https://github.com/akinomyoga/ble.sh , which is like a fish shell written in bash!
Originally OSH had more static arithmetic. But ble.sh uses the same pattern that RVM does, so then we relaxed it a little bit.
I’d be interested if RVM can run with OSH after the fix, or any more bug reports.
I’d also be interested in hearing about further progress on bootstrapping! e.g. if tcc can be built
Seems like a cool project. Since they’re specifically targeting the bootstrap use case, it would have been cool to see some comparisons to the existing GNU full-source bootstrap that’s rooted in
hex0
. While a fair amount of bootstrapping effort is necessary to get fromhex0
to a shell capable of runningpnut
, it does make me wonder if it could target a simpler subset of POSIX shell for which a simpler shell and bootstrap chain could be developed.We chose POSIX shell in particular because we think it’s a very good foundation for diverse double-compilation because of its many implementations coming from different sources that are available on almost all platforms. This means we can gain trust in the POSIX shell implementations by simply running pnut on 2 different shells (ideally on 2 independent operating systems and hardware) and comparing the results. If the results are the same, either the 2 shells are compromised in the same way, or, more likely, they are not compromised at all. This allows us to start our fully reproducible build process at a higher abstraction level, saving us from writing machine-specific seeds.
We’re still working on bootstrapping TCC (and then GCC) so we’re not there yet :) We see ourselves more as an alternative foundation on which to bootstrap GCC rather than as a competitor since we plan to reuse part of their bootstrapping path with TCC and GCC-4.7.4. Also, there may be room for pnut in GNU’s full-source bootstrap project since they end up building some basic shells which may be able to execute pnut.
This is cool!
Source code: https://github.com/udem-dlteam/pnut
Coincidentally, I was looking through Ribbit Scheme a few months ago by the same authors (which is mentioned in passing in the paper). It’s a very interesting tiny Scheme implementation, with a POSIX shell target (and many others):
https://github.com/udem-dlteam/ribbit
https://github.com/udem-dlteam/ribbit/blob/main/src/host/sh/rvm.sh - has a little garbage collector, starting on line 162
So you can use “dynamic binding” in shell like
to simulate a big array, and then you can write a GC on top of that.
It reminds me of some Lisp-in-awk implementations I looked at – awk has arrays, but not nested arrays with GC.
The C runtime doesn’t need GC of course, only Scheme, but it reminded me of that when I looked through that code
Some limitations:
This subset does seem usable for bootstrapping though!
Also table 3 of pnut running under different shells matches my experience - zsh is really slow for some reason! The rest of the shells are within a factor of 2 or 3 of each other. OSH is also in that general speed category, although it’s getting faster.
Author of the paper here,
Pnut was in part motivated by the POSIX shell Ribbit VM implementation. Writting POSIX shell script by hand is hard, and the resulting code isn’t the easiest to read! We’ve recently adapted the C Ribbit VM to be compatible with the subset supported by pnut, allowing us to generate a much clearer shell RVM from the C RVM implementation.
You’ll notice that this new version of the RVM doesn’t use
eval
and no external utility other thanread
andprintf
, which are things we disliked from the original shell RVM as it made it hard to read and not fully portable.Our impression is that function calls are slower on zsh than on other shells. The table in section 3.2.2 of the paper shows that using
set
instead oflet
/endlet
(1 extra function call per local variable) to simulate local variables brings an almost 3x performance improvement on zsh, and less than 2x on all other shells we’ve tested, and inlining of some frequently called functions also shows a larger improvement on zsh. We didn’t want to compromise readability for the performance of a particular shell, but it’s unfortunate that zsh, the default MacOS shell, is this slow.We’ve tried to run pnut with OSH but it seemed to not like the mix of negative numbers and comparison operators (
[ $((-1)) -le 0 ]
produces anInvalid integer constant '-1'
error) so we were not able to include it in our benchmarks.Thanks for the reply!
That’s cool that you can now generate a shell RVM from C! I didn’t notice that.
It makes sense not to use
zsh slownesseval
, although a bit of trivia is that shells with arrays like bash and ksh have hidden evals in arithmetic! I will put this in a second commentI also found that the zsh parser is slow, not just the runtime. It’s the slowest shell on this benchmark, and OSH is faster even though it incurs GC overhead for memory safety (unlike any other shell):
https://www.oilshell.org/release/0.24.0/benchmarks.wwz/osh-parser/
It is unfortunate it’s so slow, and the default on OS X.
OSH bugGreat bug, thank you! I just fixed it:
https://github.com/oils-for-unix/oils/commit/dc4f811557717e2d7405a9cb0d2b88b3a906c0fc
You can try it here if you like, or wait until the next release:
https://op.oilshell.org/uuu/github-jobs/8255/
https://op.oilshell.org/uuu/github-jobs/8255/cpp-tarball.wwz/_release/oils-for-unix.tar
I think this bug was due to sharing too much code between
[
and[[
. I also found a related bug, where[[
accepts octal integers, but[
doesn’t:So now OSH matches this behavior! (though I hope new users will write YSH instead, rather than learning this crazy bash stuff)
So basically the entire shell language is available, as long as you wrap it in
a[$( mycode )]
.a[]
shopt --set unsafe_arith_eval
, so by default there is no hiddeneval
.I rediscovered this 5 years ago, and OpenBSD fixed it based on my report, but it still exists in bash and other shells:
https://github.com/oils-for-unix/blog-code/tree/main/crazy-old-bug
Actually, allowing
: $(( a_$x = 1 ))
but not arbitrary shell is something that we did for https://github.com/akinomyoga/ble.sh , which is like a fish shell written in bash!Originally OSH had more static arithmetic. But ble.sh uses the same pattern that RVM does, so then we relaxed it a little bit.
I’d be interested if RVM can run with OSH after the fix, or any more bug reports.
I’d also be interested in hearing about further progress on bootstrapping! e.g. if tcc can be built
I posted this thread a couple months ago - https://lobste.rs/s/obokni/bootstrap_linux_system_from_512_byte
And I noted the differing philosophies on generated code
Also I remember talking to Marc Andre Belanger about Oils a couple years ago, and also about related projects such as Ribbit, which I found very cool!