Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convince OpenBSD kernel developers to support an executable obtaining the path to its own binary #6718

Open
andrewrk opened this issue Oct 18, 2020 · 7 comments
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. os-openbsd upstream An issue with a third party project that Zig uses.
Milestone

Comments

@andrewrk
Copy link
Member

OpenBSD takes the prize for the hackiest implementation of std.fs.selfExePath, a function which is readily available on Linux, Darwin/macOS, FreeBSD, DragonFlyBSD, NetBSD, and Microsoft Windows.

zig/lib/std/fs.zig

Lines 2235 to 2271 in 71ac5b1

.openbsd => {
// OpenBSD doesn't support getting the path of a running process, so try to guess it
if (os.argv.len == 0)
return error.FileNotFound;
const argv0 = mem.span(os.argv[0]);
if (mem.indexOf(u8, argv0, "/") != null) {
// argv[0] is a path (relative or absolute): use realpath(3) directly
var real_path_buf: [MAX_PATH_BYTES]u8 = undefined;
const real_path = try os.realpathZ(os.argv[0], &real_path_buf);
if (real_path.len > out_buffer.len)
return error.NameTooLong;
mem.copy(u8, out_buffer, real_path);
return out_buffer[0..real_path.len];
} else if (argv0.len != 0) {
// argv[0] is not empty (and not a path): search it inside PATH
const PATH = std.os.getenvZ("PATH") orelse return error.FileNotFound;
var path_it = mem.tokenize(PATH, &[_]u8{path.delimiter});
while (path_it.next()) |a_path| {
var resolved_path_buf: [MAX_PATH_BYTES]u8 = undefined;
const resolved_path = std.fmt.bufPrintZ(&resolved_path_buf, "{s}/{s}", .{
a_path,
os.argv[0],
}) catch continue;
var real_path_buf: [MAX_PATH_BYTES]u8 = undefined;
if (os.realpathZ(&resolved_path_buf, &real_path_buf)) |real_path| {
// found a file, and hope it is the right file
if (real_path.len > out_buffer.len)
return error.NameTooLong;
mem.copy(u8, out_buffer, real_path);
return out_buffer[0..real_path.len];
} else |_| continue;
}
}
return error.FileNotFound;
},

Let's put some friendly pressure on the OpenBSD project to improve this use case.

@andrewrk andrewrk added contributor friendly This issue is limited in scope and/or knowledge of Zig internals. upstream An issue with a third party project that Zig uses. os-openbsd labels Oct 18, 2020
@andrewrk andrewrk added this to the 1.1.0 milestone Oct 18, 2020
@semarie
Copy link
Contributor

semarie commented Oct 18, 2020

if I take my OpenBSD core-developer hat, I would say it will be complex 😃 but I am opened to discuss it

let's me try to explain the problem from kernel point of vue. kernel should only provider information which is accurate, else it could lead to obscure errors or eventually security issues. having a interface (syscall or sysctl entry) to retrieving the pathname of the current running executable which is accurate all the time is really complex:

  • the executable could have several pathname representing it (hard or soft symlinks), and user could be able to see it with one path and not with another
  • the user could be able to run the binary without having read permission on it
  • the file could be renamed, moved (from one directory to another), being unlinked, or replaced
  • a directory compoment of the path could be renamed, moved (directory move), replaced
  • ...
    so OpenBSD prefers provides no interface instead of an interface which could return errornous result.

Even Linux provides only a partial solution (to my knowledge). For example, zig code source has a comment in zig code saying readlink(2) will return garbage if the file is deleted (and there is no code for such case, even just panic).
And others OS implementations have other behaviour in such "ill" cases, like returning the pathname used at execve(2)-time, even if it points on a different file now. it is a easy footgun.

This lead to a second question: for what usage such path is need ? Because getting a pathname is per-se asking for trouble: the pathname could be out-of-date as soon as retrieved even if the kernel takes care of all the possible problems (see TOCTOU).

For citing an example, Rust env:current-exe() has been discussed a bit regarding this kind of problem, and several actions was done:

Regarding zig's standard library, selfExePath() is used for several things. Here all entrypoints resulting possible call to selfExePath() (which could return wrong/racy result):

  • fs.zig : selfExePath()
  • fs.zig : openSelfExe() - on !linux and !windows platforms
  • fs.zig : selfExeDirPath()
  • debug.zig : DebugInfo.getModuleForAddress() (via lookupModuleDl())
  • debug.zig : printSourceAtAddress()
  • debug.zig : dumpStackTraceFromBase(), writeStackTrace(), ...

Which makes me to think that every zig binary could potentially call such racy function to do the complex task of parsing a binary, whereas not all OS provides strong guarantee on the path quality. but I am unsure of the cases where stacktrace is printed (instead of an error return path).

Regarding zig compiler (the binary), it is using exclusively selfExePath():

  • for introspect.findZigLibDirFromSelfExe() - searching for "std.zig" file
  • for reexecuting zig in cmdBuild()

in both cases, the "traditional" way (when an application is source compiled) is to use a path provided at compile-time. but I agree it doesn't work for binary distribution where the installation directory isn't know at compile-time.

if the path returned is wrong, it would mean building a executable with "wrong" std, or code-execution to "wrong" binary. but I agree that such problem is more theorical than pratical.

Now, if I am returning to the original question to have such interface in OpenBSD kernel. I hope to have explained correctly why the kernel should not provide possibly inaccurate pathname. Eventually, an interface which return the descriptor (like for openSelfExe()) could be looked at, but the last time it was discussed there were problem regarding how to provide such descriptor without letting restricted programs to gain too many informations (because if any program could easily read the current executable, it could gain information on possible gadgets and their relatives positions for example).

@rohlem
Copy link
Contributor

rohlem commented Oct 20, 2020

I don't understand any concerns with letting an executable read its own data.
The naive, worst-case solution could always be to embed a copy of the entire (remaining) binary executable within some read-only data segment. That burdens the linker and doubles the file size, but there is no (realistic) way of an OS ever inhibiting this.

@spytheman
Copy link

@semarie

if I take my OpenBSD core-developer hat, I would say it will be complex 😃 but I am opened to discuss it

It may have a lot of complexities, but imho, it would be much better, to be done just once in the kernel, by qualified people like you, that do understand all the risks and potential pitfalls, then documented very well, with all the possible pitfalls, like most other platform do.

The current situation, leads to the fact, that each program and language implementation, that needs to be ported to OpenBSD, now has to have a lot of unreliable heuristics in user space, that are rediscovered over and over again.

For example, see: https://github.com/time-killer-games/libfilesystem/blob/ea009efea523439455327552f12fc71fa30c1e50/filesystem.cpp#L506C5-L506C5 and compare the complexity of the case for OpenBSD, for all other cases.
See also the comments for godotengine/godot#61540 .

@johnsonjh
Copy link

johnsonjh commented Jan 4, 2024

This would help us as well.

Please look at our _sir_getappfilename and the _sir_openbsdself function it uses for another example.

We run on Linux, Android, AIX, macOS, Windows, Cygwin, FreeBSD, NetBSD, OpenBSD, DragonFly BSD, GNU/Hurd, Haiku, Solaris, illumos, SerenityOS, and Node.js+Noderawfs.

Only the OpenBSD and AIX implementations are ridiculous and probably unreliable.

Edit: The reason we need this is to implement features that operate relative to the location of the executable in the filesystem, such as dynamically loading plug-ins that add new functionality, without requiring the user (or an installation program) configuring absolute paths, and not breaking an existing installation if the program is moved.

@time-killer-games
Copy link

time-killer-games commented Jan 13, 2024

@johnsonjh not to be too far off topic or anything but I'd like to help contribute to your project, may I have a means to contact you outside of github? I prefer discord or slack for realtime chat if you have one, that way when I tell you the things I have to say it wont go here or somewhere else where it doesnt belong. My discord is "samuelvenable" my slack is "Samuel Venable". Its easier to chat in realtime than on a github issue or pull request.

@time-killer-games
Copy link

time-killer-games commented Jan 13, 2024

@semarie I would like to mention, Windows prevents a lot of the issues you mentioned by outright preventing an executable from being moved or unlinked if it currently has an actively running process which spawned from it. That would be a very drastic change to how OpenBSD works if they were to go that route, but I have no doubt there are more elegant ways of handling this. For example, if a sysctl() had an interface for getting the executable path, and the path returned points to a hard link which either no longer exists or doesn't have a matching kinfo_file.va_fileid (stat()/fstat() ino_t) and kinfo_file.va_fsid (stat()/fstat() dev_t) then under those cases, let sysctl() have a return value to indicate failure and set errno to indicate what the actual error was. See this code I wrote for example: https://github.com/time-killer-games/xproc/blob/fd06a3e5978c99a33ef57bc13b577eab8648cdcd/process.cpp#L907-L1007 And of course, if the link returned by sysctl() is a soft link, just use realpath() or readlink() recursively until it returns a valid hard link, otherwise, return an error indicator and set errno. On error, the developer could then use a fallback solution or a fallback path option they come up with on their own, even if that means just using the working directory or $HOME/.confg/whatever. It would be up to them because the Kernel-level data would not be accurate or useable. There is even a way to prevent the concern you had with multiple hard links, if multiple hard links to the executable exists (stat()/fstat() st_nlink), again, return error value to sysctl() and set errno. All these failure cases could be properly documented and all your concerns are nullified by this in my honest opinion. I tried suggesting these things to the OpenBSD mailing list, and everyone was helpful and nice to me, except the project leader Theo de Raadt, (go figure), so I gave up on the possibility of ever contributing to OpenBSD, I want to apologize to him for how I wrongly reacted to him, but I doubt that will help the situation.

@johnsonjh
Copy link

johnsonjh commented Jan 14, 2024

@time-killer-games I've found Theo to be "hard" but fair, if a little bit stressed at times. I'd give him another chance for sure.

The "problem" with adopting some of the windows solutions is that it's impossible to do and maintain full POSIX semantics that BSD and other Unix systems target with their filesystem implementations. Making a change like that would break many existing applications that use a rename/remove/replace/re-execution workflow for updating themselves, for example.

Also, you need to remember that a program executing may be running directly from memory and might not even exist on the filesystem at all - or ever. Look at "crazy" things like https://github.com/XiphosResearch/netelf or https://github.com/Frodox/execute-machine-code-from-memory/

However, these corner cases are acceptable if the use case is correctly specified and the limitations documented, so application developers can avoid TOCTOU (time-of-check vs. time-of-use errors).

In most cases, when using a function like this, what the caller really wants to know is where the program was on the disk at the time that it was executed, and it shouldn't make any further assumptions beyond that, especially nothing that could be security sensitive.

I believe that Theo had commented that he didn't want to introduce an interface where you could easily introduce security issues if you misunderstood it, but perhaps I'm misremembering.

Edit: I'll try to reach you again later on Discord or via email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. os-openbsd upstream An issue with a third party project that Zig uses.
Projects
None yet
Development

No branches or pull requests

6 participants