Skip to content

Instantly share code, notes, and snippets.

@rrbutani
Last active December 25, 2024 05:48
Show Gist options
  • Save rrbutani/16c7c66202ed47b2a5dd8cb5e916fb0a to your computer and use it in GitHub Desktop.
Save rrbutani/16c7c66202ed47b2a5dd8cb5e916fb0a to your computer and use it in GitHub Desktop.
macOS sandbox notes
(placeholder to set the gist name)

(the assertions indicate it's a scheme interpreter)

Trying to reproduce/get to the bottom of this error: NixOS/nix#4119

top-level subexpression atom limit

this is a restriction I ran into while trying to create a sanbox scheme file that trips the "pattern serialization length" error

there seems to be a limit to the number of atoms you can have for subexpressions, at the top-level and this limit seems to be 38834

having errors when you're right up against the limit (38835 to 38840, inclusive) hits an assertion: Assertion failed: (sc->gc_protect_lst == initial_gc_protect_lst), function opexe_0, file scheme.c, line 2933. past this point you just get profile compilation failed

whitespace, indentation, comments seem irrelevant

there does not seem to be a limit on top-level atoms, only top-level subexpression atoms

the length of each individual atom seems irrelevant

  • though, as an aside, strings seem to be limited to 1023 bytes (unicode is accepted); past this you get:
    sbpl1:35:4: Error reading string
    (list 'deprecated (lambda args (if (= 0 (length args)) (disable-full-symbolication) (error "unexpected argument"))))
    

the kind of atom seems irrelevant, but different kinds of atoms seem to count towards this limit differently:

  • bare words (i.e. test) and parens (i.e. ()) count as 1
  • numbers and strings count as two (perhaps because these desugar to pairs or ()s?)
    • i.e. (0 0 0 ...) will error after 19417 zeros and ("a" "a" "a" ...) will error after 19417 "a"s
      • (since (19417 * 2 + 1) is when you first exceed 38834

() counts as one towards the limit, (()) counts as two (as does (test)) ("a") counts as three, ("a" "a") counts as five

  • this is why the above has an extra + 1; the top-level (...) counts as 1

NOTE this restriction only seems to exist if you use (version 1); (version 2) and (version 3) happily accept more atoms in subexpressions

  • not sure what macOS version started accepting version 2/3 though

pattern serialization length error

using the repo linked here to reproduce this error nix build .#devShell.aarch64-darwin --option sandbox true -vvvv

happens regardless of version (1, 2, and 3)

the length that the error refers to seems to be the length of some processed artifact:

  • for example, there's definitely deduplicating going on; copying the same subpath multiple times does not change the length in the error message
  • as others have noted, this processing/serialization step seems to operate on the full list; splitting up the paths into multiple top-level allows or duplicating paths in separate top-level allows has no effect on the length
  • the post processing is more sophisticated than just deduping; it also seems to eliminate implied subpaths:
    • i.e. (subpath "/tmp") and (subpath "/tmp") (subpath "/tmp/blue") have the same length
    • same with literals:
      • (subpath "/tmp") and (subpath "/tmp") (literal "/tmp") and (literal "/tmp") (subpath "/tmp") and (subpath "/tmp") (literal "/tmp/foo") all have the same length
  • looking at how the length reacts to paths like (literal "/a") and (literal "/u") being added in the presence of other paths, it definitely seems like there's some kind of regex minimization going on here
  • it also seems to eliminate literals and subpaths against regexs that imply them; not sure how sophisticated this analysis is but I have not been able to flummox it yet
  • naive "optimizations" like turning (subpath "/nix/store/foo") and "(subpath "/nix/store/bar") into (regex #"^/nix/store/(foo|bar)) yield no change in length
  • I think this is an indication that sandbox-simplify disappeared because it was subsumed into sandbox-exec...

Also, some regexes do indeed produce this error

potential solutions

nesting sandbox-exec calls?

  • even if this is permitted, unless we switch to allowing all of /nix/store and denying files (i.e. inverting the filter) in these cases, I don't think this is possible
    • and, I don't think ^ is a good idea

reverse-engineering the bytecode format that the kernel accepts for these filter programs

  • it seems like what's going on is that sandbox-exec takes this scheme program that it runs to get a filter program that contains regular expressions and such
  • it then compiles down this file of regexes, etc. into bytecode that the kernel executes
  • the "serialization length" limit seems to be a restriction on the length of this bytecode
    • it's not clear but I assume this is a limit imposed by the kernel and not just sandbox-exec
  • if we think we can do better in terms of cramming things into binary size (or adding escape hatches to permit a kind of nested sandbox thing as described above) then we could do the bytecode generation ourselves

actually, this paper describes some findings on the mechanics of the sandbox bytecode: https://www.ise.io/wp-content/uploads/2017/07/apple-sandbox.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment