(the assertions indicate it's a scheme interpreter)
Trying to reproduce/get to the bottom of this error: NixOS/nix#4119
this is a restriction I ran into while trying to create a sanbox scheme file that trips the "pattern serialization length" error
there seems to be a limit to the number of atoms you can have for subexpressions, at the top-level and this limit seems to be 38834
having errors when you're right up against the limit (38835 to 38840, inclusive) hits an assertion: Assertion failed: (sc->gc_protect_lst == initial_gc_protect_lst), function opexe_0, file scheme.c, line 2933.
past this point you just get profile compilation failed
whitespace, indentation, comments seem irrelevant
there does not seem to be a limit on top-level atoms, only top-level subexpression atoms
the length of each individual atom seems irrelevant
- though, as an aside, strings seem to be limited to 1023 bytes (unicode is accepted); past this you get:
sbpl1:35:4: Error reading string (list 'deprecated (lambda args (if (= 0 (length args)) (disable-full-symbolication) (error "unexpected argument"))))
the kind of atom seems irrelevant, but different kinds of atoms seem to count towards this limit differently:
- bare words (i.e.
test
) and parens (i.e.()
) count as 1 - numbers and strings count as two (perhaps because these desugar to pairs or
()
s?)- i.e.
(0 0 0 ...)
will error after 19417 zeros and("a" "a" "a" ...)
will error after 19417"a"
s- (since (
19417 * 2 + 1
) is when you first exceed38834
- (since (
- i.e.
()
counts as one towards the limit, (())
counts as two (as does (test)
)
("a")
counts as three, ("a" "a")
counts as five
- this is why the above has an extra
+ 1
; the top-level(...)
counts as 1
NOTE this restriction only seems to exist if you use (version 1)
; (version 2)
and (version 3)
happily accept more atoms in subexpressions
- not sure what macOS version started accepting version 2/3 though
using the repo linked here to reproduce this error
nix build .#devShell.aarch64-darwin --option sandbox true -vvvv
happens regardless of version
(1, 2, and 3)
the length that the error refers to seems to be the length of some processed artifact:
- for example, there's definitely deduplicating going on; copying the same
subpath
multiple times does not change the length in the error message - as others have noted, this processing/serialization step seems to operate on the full list; splitting up the paths into multiple top-level
allow
s or duplicating paths in separate top-levelallow
s has no effect on the length - the post processing is more sophisticated than just deduping; it also seems to eliminate implied subpaths:
- i.e.
(subpath "/tmp")
and(subpath "/tmp") (subpath "/tmp/blue")
have the same length - same with literals:
(subpath "/tmp")
and(subpath "/tmp") (literal "/tmp")
and(literal "/tmp") (subpath "/tmp")
and(subpath "/tmp") (literal "/tmp/foo")
all have the same length
- i.e.
- looking at how the length reacts to paths like
(literal "/a")
and(literal "/u")
being added in the presence of other paths, it definitely seems like there's some kind of regex minimization going on here - it also seems to eliminate
literal
s andsubpath
s againstregex
s that imply them; not sure how sophisticated this analysis is but I have not been able to flummox it yet - naive "optimizations" like turning
(subpath "/nix/store/foo")
and"(subpath "/nix/store/bar")
into(regex #"^/nix/store/(foo|bar))
yield no change in length - I think this is an indication that
sandbox-simplify
disappeared because it was subsumed intosandbox-exec
...
Also, some regexes do indeed produce this error
nesting sandbox-exec calls?
- even if this is permitted, unless we switch to allowing all of
/nix/store
anddeny
ing files (i.e. inverting the filter) in these cases, I don't think this is possible- and, I don't think ^ is a good idea
reverse-engineering the bytecode format that the kernel accepts for these filter programs
- it seems like what's going on is that
sandbox-exec
takes this scheme program that it runs to get a filter program that contains regular expressions and such - it then compiles down this file of regexes, etc. into bytecode that the kernel executes
- the "serialization length" limit seems to be a restriction on the length of this bytecode
- it's not clear but I assume this is a limit imposed by the kernel and not just
sandbox-exec
- it's not clear but I assume this is a limit imposed by the kernel and not just
- if we think we can do better in terms of cramming things into binary size (or adding escape hatches to permit a kind of nested
sandbox
thing as described above) then we could do the bytecode generation ourselves
actually, this paper describes some findings on the mechanics of the sandbox bytecode: https://www.ise.io/wp-content/uploads/2017/07/apple-sandbox.pdf