Threads for andreypopp

    1. 13

      Hey lobsters. @jspahrsummers and I created this at Anthropic. I hope I didn’t cross any lines of self-promotion here, but I am super excited about this and wanted to share it.

      Let us know if you have any questions.

      1. 3

        Congrats on launching. There a couple of weird things about this protocol launch:

        • It is not web compatible as the whimsical comment below mentions eg its meant to be run locally
        • It is not usable from a website
        • There is no mention of prior art like cohere connectors or even how this uplevels from using tool calling functionality in llms

        Wehn faced with a similar need in my hobby project, https://chatcraft.org I found webrtc to be a much more suitable transport layer for llms as it can work for both local native apps and connect over network securely. https://gist.github.com/tarasglek/ff3353169d94e82cbd91218ac43188d6 might be sufficient to grasp my approach

        1. 1

          We will work on remote connectivity. The current “implementation” is focused on local-only, but we hope the underlying primitives will hold true for remote connections.

          My understanding is that Cohere Connectors and others aim to solve something very different. MCP is trying to solve the NxM problem of applications to context providers and build in the open. Anybody is free to implement it both on the client as well as on the server side. It is also focused on general interaction primitives rather than pure data providers (hence the separation of prompts and resources for example). As such it’s fundamentally different from proprietary APIs that are bound to a specific product surface. I personally feel LSP is a much closer related concept and something that inspired us.

      2. 1

        Were you inspired by LSP?

        1. 2
      3. 1

        Why json schema and json rpc and stdio? I understand that json is the lowest common denominator (somewhat pejorative!) for data interchange with all kind of clients. Was making the transport work over http and stdio that important? Why couldn’t local agents just use http? Surely there must be a schema-based approach to specifying the entire protocol.

      4. 1

        I work at Whimsical. We have a GPT that people use to create diagrams and flowcharts, and I’d love to build something similar for Claude. The Model Context Protocol is very cool but it seems like it currently requires every context provider to write a server that will run locally on a user’s machine.

        It would be great if there was a server-side protocol that we could implement in our existing API which wouldn’t require writing a server. There are two reasons for this:

        1. Writing, maintaining, and distributing an MCP server is more work than implementing new API endpoints in our existing app.
        2. More importantly, it is much lower trust and effort for users to configure their LLM to point to a web service versus downloading and running a server that has local code execution.

        Do you have plans for something like this in the future?

        1. 2

          We are working on thinking through “remote MCP”, which would solve this (as I understand it). We started with local MCP because we wanted to see if the concepts hold up while we work on hard parts of remote transports, particularly getting authentication right. We will tackle that aspect next and certainly recognize that this would unlock a lot of use cases.

          (Also a good part of MCP comes from our own internal need and usage, where we can easily distribute servers)

    2. 9

      For example in my code I have this function to get a config filename from the environment variable:

      let get_config_filename: string =
       try
         let path = Sys.getenv "CONFIG" in
         path
       with Not_found ->
         "./websites.yaml"
      

      That’s not a function - that’s a named value. It’ll be evaluated only once. To make it a function, you’d need to take an argument (usually unit, i.e. ()), like:

      let get_config_filename () = (* ... *)
      

      I think this confusion is why some ML-syntax-like languages try to differentiate it with a different keyword for functions (like Rust’s fn) or using lambda syntax (like Roc’s name = \arg -> result.)

      Apart from that, I agree with the author’s conclusion that OCaml is hard to get into. It’s hard to figure out how you’re supposed to solve common problems, and there’s scarce few “blessed” approaches to do common things. This is amplified by the OCaml 5 and domains situation because the earlier popular approaches for things aren’t necessarily how things should be done in the post-5 world. There’s been a few times where I look up how to do something, and decide on library/approach foo, but while I’m working on it I face issues like what the author did with crawl_website, and I’ll run into threads on reddit where people will be like “Obviously you shouldn’t use foo, use ppx_bar instead” and I’ll give up and resolve not to use OCaml again…

      1. 3

        Would following a guide like https://dev.realworldocaml.org/ help with the no-blessed-ways problem, in your opinion?

        (I ask because I think I want to try going down the OCaml rabbit hole using that book.)

        1. 6

          I haven’t looked into it too deeply, but even the newest edition of RW OCaml is not written for OCaml 5+, so it won’t cover domains and whatnot. Add to that the usual Async vs Lwt argument and I don’t recommend it right now.

          1. 1

            Oh, darn. Very good to know…

          2. 1

            nooooooooooo! I was hoping I could find a good resource to read.

            1. 3

              Not a guide, but recently read this user experience of somebody switching to Eio which might be nice for comparison: https://tarides.com/blog/2024-09-19-eio-from-a-user-s-perspective-an-interview-with-simon-grondin/ - I don’t have any personal experience with OCaml’s async ecosystem though.

            2. 2

              The readme is really good and worth a read https://github.com/ocaml-multicore/eio?tab=readme-ov-file

              1. 1

                oh neato!

        2. 3

          It’s me, author of this article. Thanks a lot for clarification. Does that mean even if env var value changes it won’t be reflected because it is executed only once?

          1. 3

            kivikakk already covered your question but I’ll throw in a helpful tip for knowing the difference in the future: all functions in OCaml must have an -> in their type. You can use your editor’s LSP integration (e.g. in neovim I have to press K on a symbol to see its docs) to see that the inferred type of get_config_filename is string, while a function would be () -> string.

          2. 3

            Yes — you can prove this to yourself by trying something like this:

            let main () =
              let get_config_filename: string =
                print_endline "evaluating get_config_filename";
                try
                  let path = Sys.getenv "CONFIG" in
                  path
                with Not_found ->
                  "./websites.yaml"
              in
              print_endline get_config_filename;
              print_endline get_config_filename
            in
            main ()
            

            Output:

            evaluating get_config_filename
            ./websites.yaml
            ./websites.yaml
            

            That said, how are you anticipating the environment variable changing? OCaml’s stdlib itself doesn’t expose setenv or unsetenvnote aborted attempt here.

            edit: oh, never mind me! Unix.putenv is mentioned right there, which of course works on macOS. So you can try this:

            #load "unix.cma";;
            let main () =
              let get_config_filename: string =
                print_endline "evaluating get_config_filename";
                try
                  let path = Sys.getenv "CONFIG" in
                  path
                with Not_found ->
                  "./websites.yaml"
              in
              print_endline get_config_filename;
              Unix.putenv "CONFIG" "hello";
              print_endline get_config_filename
            in
            main ()
            

            Which outputs the same as above. Now note the change here:

            #load "unix.cma";;
            let main () =
              let get_config_filename () =
                print_endline "evaluating get_config_filename";
                try
                  let path = Sys.getenv "CONFIG" in
                  path
                with Not_found ->
                  "./websites.yaml"
              in
              print_endline (get_config_filename ());
              Unix.putenv "CONFIG" "hello";
              print_endline (get_config_filename ())
            in
            main ()
            

            And the output:

            evaluating get_config_filename
            ./websites.yaml
            evaluating get_config_filename
            hello
            
      2. 3

        This is amplified by the OCaml 5 and domains situation because the earlier popular approaches for things aren’t necessarily how things should be done in the post-5 world.

        Keep in mind that OCaml 5 is very very young and that 5.0, 5.1 and even 5.2 are not considered ready yet. I agree that OCaml often lacks a one-true way but OCaml 5 is not a good example of that.

        1. 8

          I agree that it is young, but at this point it objectively is the future of OCaml, so for someone just getting into it it doesn’t make sense to not start with it. Doubly so because it affects IO, which is what most basic and common programs do! The author of this article started out writing a scraper, my first attempt was to write a discord bot, and many people kick the tires with a web framework like Dream. So the problem of pre-4 approaches not being the future is more of a problem: as a newcomer, is what I’m learning going to become outdated soon? Especially when it’s the core of what I’m doing?

          I don’t want to be overly negative abotu OCaml because I really like the language otherwise, so I’ll point out that there are people like @leostera on twitter who are working really hard on this problem and have made great progress. For people who want to get started with OCaml, I think https://github.com/leostera/minttea is natively OCaml 5 and really fun for making TUIs.

          1. 2

            I believe it does make sense to start with Lwt rather than with Eio now. There’s simply more learning resources and more Lwt-compatible libraries. Lwt won’t become obsolete in the near future and the knowledge of OCaml+Lwt will transfer good to OCaml+Eio — same language after all.

      3. 1

        I have a very similar experience, every time I want to try some networking related I am torn between Eio and Lwt.

    3. 2

      Does it support osc52 for copy/paste, which is very useful to propagate clipboard across ssh/tmux/nvm or is that superseded by having wezterm available remotely and multiplexing (no tmux)?

      1. 3

        Yup, it supports OSC 52.

    4. 2

      Really love code mirror 6, I’ve used it to build a repl and a notebook for BQN.

    5. 1

      does ghostty supports graphics in terminal? with kitty (and tmux) I can render graphviz directly onto terminal which I find handy when debugging ragel grammars, for example

      1. 3

        Yes, It supports the Kitty graphics protocol.

        1. 3

          Also, I’ve been planning to start working on sixel - I’ve implemented color decoding, and just left it after that

    6. 6

      So we have JSX-like language support for:

      Any others?

      1. 3

        Technically Elixir has something similar to that thanks to sigil macros and HEEx.

      2. 2

        Pushup https://pushup.adhoc.dev is not JSX exactly, but it’s HTML-first blended with Go for control flow and variable access. (I’m the Pushup creator.)

        1. 1

          Thanks. Not sure what to think about the syntax though. Why not make it like JSX instead? Would also make it easier to port tooling and editor support to.

    7. 2

      This is pretty cool, I’m also doing AOC2022 partly in BQN, partly in my own K.jl (K on top of Julia).

    8. 5

      This is a nice way to test software! By the way OCaml’s default build system dune has this feature built-in.

    9. 1

      There’s also Lemon parser generator (which is used by SQLite, from the same authors).

      1. 1

        Lemon is LALR(1), so it’s completely different.

        1. 1

          Sure, just wanted to point out a good alternative, also LALR is superior to LR! :-)

    10. 3

      I remember this was useful for some Advent Of Code puzzle few years ago.

    11. 2

      For those who like LALR parser generators there’s Lemon, the one used in SQLite.

    12. 4

      I wonder if you could support mutability somehow?

      I’m partly imagining torrent websites hosted on bittorrent (because it’s kinda meta) but could be generally useful/interesting perhaps.

      1. 4

        There’s a bittorrent protocol extension where you can distribute a public key that points to a mutable torrent, but I don’t know if it has been ported to webtorrent.

        1. 2

          The reference implementation for BEP0046 is done with webtorrent, don’t know if/how it works in browser though.

          1. 2

            As far as I understand, you can’t use DHT in a web browser, as nodes do not support WebRTC. The Webtorrent project includes a DHT library that works with Node.js (which is used by the desktop application).

    13. 5

      I’ve worked at the company which developed HTSQL.

      It was used by software engineers and by (data/business) analysts.

      In my experience HTSQL was very good, especially for data exploration / creating ad-hoc reports. It is easy to pick up and very concise.

      Now the core developers of HTSQL are working on FunSQL, I’ve posted about it some days ago.


      Besides that piece of tech they have Rex Deploy — an Ansible-style database migration system — https://github.com/prometheusresearch/baseline-codebase/blob/baseline/master/src/rex.deploy/README.rst

      1. You can specify a table as a set of facts (you can even split definition into multiple facts)
      2. Easy to specify column renames
      3. Possible to put some dictionary-style data into the database as a part of migration
      4. Possible to drop down to SQL for something which isn’t supported out of the box by Rex Deploy.
    14. 3

      I see FunSQL.jl is already mentioned in the discussion but I still want to highlight how it works as I think FunSQL represents a very interesting point in design space of “SQL replacements”.

      FunSQL is a Julia library which presents a combinator-based API to generate SQL. The query construction looks like:

      From(:person) |>
      Where(Fun.between(Get.year_of_birth, 1930, 1940)) |>
      Select(Get.person_id)
      

      I’ve myself ported the lib (with some deviations) to Python and OCaml (ended up using this one) and built a concrete syntax on top of it, so at the end it looks very similar to PRQL, albeit I’ve kept it conservative syntax-wise (closer to SQL). Same query as above in the concrete syntax:

      from person
      where year_of_birth >= 1930 and year_of_birth <= 1940
      select person_id
      

      (I’ll use this syntax going forward in this post)

      Now the feature I’ve wanted to highlight (and which I think makes FunSQL very useful) is how it treats group by and aggregate expressions. An example first:

      from users as u
      join (from comments group by user_id) as c on c.user_id = u.id
      select
        u.username,
        c.count() as comment_count,
        c.max(created_date) as comment_last_created_date
      

      Here the subrelation c which is joined to u is being grouped by user_id column. The c itself doesn’t specify any aggregates though as you can see. All aggregates are located in the single select below.

      FunSQL treats grouped relations as a special kind of namespace from which you can either select columns you grouped by or refer to aggregate functions.

      In my mind, this feature is very powerful as it allows one to incrementally construct reusable query fragments and then use them to build final queries. Same query in two steps:

      let users_with_comments =
        from users as u
        join (from comments group by user_id) as c on c.user_id = u.id
      
      from users_with_comments
      select u.username, c.array_agg(text) as comments
      

      In other words: you can build a nested namespace from relations / grouppings and then later decide what you want to select from this namespace.

      This surprisingly works well if you want to implement visual query builders (I have one for my OCaml port).

      Check out JuliaCon 2021 presentaton about FunSQL if interested, the main author Kyrylo (who’s by the way the co-creator of HTSQL, another query language mentioned in the discussion here) talks about motivation and design of FunSQL.

    15. 3

      Doing AoC in BQN!

    16. 6

      They’ve added virtual lines! Would love to see REPL plugins support this to show output inline. Also theorem prover plugins!

      1. 3

        What is a virtual line?

        1. 9

          It is a line of text visible in the editor but is not present in the actual text buffer. Previously nvim supported virtual text inline, used to show warnings or other information after a line of code for example. Now entire lines can be inserted virtually.

      2. 2

        I might be poking at this in the future – editor integrations for annotated code.

    17. 2

      I wish I could buy JetBrains products packaged as LSP servers (or with extensions or maybe using some other protocol) so I can use them with Vim.

      1. 1

        Seems like they’re leveraging LSP, so this might not be too far fetched!

        https://www.jetbrains.com/help/fleet/1.0/architecture-overview.html

    18. 2

      I’ve enjoyed how they render Coq proofs with all the prove state extracted from Coq and available on hover.

      1. 1

        That is legitimately awesome. Does anyone have any idea what plugin / library enables that? And, if it works for other proof assistants like Isabelle? (I’m not a Coq user)

          1. 1

            Awesome, thank you. Wish it worked for Isabelle, but that’s great to have a tool to reference.

          2. 1

            Ahh, based on the README it seems like it has preliminary support for Lean 3, which is neat!

        1. 2

          You might also enjoy this presentation by the creator, and the paper that goes with it: Untangling mechanized proofs.

          1. 1

            Yes, that’s great content, thank you.

            Proof assistants are amazing tools, and some of them do offer a way to print out the full proof that they end up automating. Showing intermediate proof states is definitely nice, but some tactics still take very large leaps (like tauto used in the paper). In those cases, I want to see insight into how the step was taken in between goals as well.

    19. 5

      Planning the next release of https://bupstash.io/ / https://github.com/andrewchambers/bupstash . I am way behind schedule but want to do a closed beta of managed repositories.

      Also thinking a bit about a new programming language I want to work on for fun - A combination of standard ML drawing heavy inspiration from the https://github.com/janet-lang/janet runtime.

      I also have some ideas kicking around for my peer to peer package tree - https://github.com/andrewchambers/p2pkgs .

      So many things to do - clearly I need to avoid spending time on the less important things - I just have trouble reigning in what my mind is currently obsessing over.

      1. 2

        Also thinking a bit about a new programming language I want to work on for fun - A combination of standard ML drawing heavy inspiration from the https://github.com/janet-lang/janet runtime.

        Do you mean you want to reimplement StandardML but on top of Janet’s like runtime? Or is there something specific to Janet which can influence the SML language itself?

        I’m myself contemplating a compile to LuaJIT ML-like language: the efficiently of LuaJIT and its convenient FFI + ergonomics of ML (though I’d want to experiment with adding modular implicits to the language).

        I also have some ideas kicking around for my peer to peer package tree - https://github.com/andrewchambers/p2pkgs .

        Is this related to Hermes (sorry for lots of questions but you have so many interesting projects)? Are you still using/developing it?

        Some time ago I was working on designing and implementing esy which is a package manager + meta build system (invoking package specific build system in hermetic environments) for compiled languages (OCaml/Reason/C/C++/…). It looks like Nix but has integrated SAT solver for solving deps, we rely on package.json metadata and npm as a registry of package sources (though we can install from git repos as well).

        Personally, I think there’s a real opportunity to make a “lightweight” version of Nix/Guix which could be used widely and Hermes seems to be aimed to at this exact spot.

        1. 1

          Do you mean you want to reimplement StandardML but on top of Janet’s like runtime? Or is there something specific to Janet which can influence the SML language itself?

          The way janet has great CFFI and a few things like compile time evaluation and the language compilation model. I also enjoy how janet can be distributed as a single amalgamated .c file like sqlite3. My main criticism of janet is perhaps the lack of static types - and standard ML might be one of the simplest ‘real’ languages that incorporates a good type system, so I thought it might be a good place to start for ideas.

          I’m myself contemplating a compile to LuaJIT ML-like language: the efficiently of LuaJIT and its convenient FFI + ergonomics of ML (though I’d want to experiment with adding modular implicits to the language).

          Yeah, the way it complements C is something I would love to capture. I am not familiar with modular implicits at all - but it sounds interesting!

          Is this related to Hermes (sorry for lots of questions but you have so many interesting projects)? Are you still using/developing it?

          Yes and no - p2pkgs is an experiment to answer the question - ‘what if we combined ideas from Nix with something like homebrew in a simple way?’ I think the answer is something quite compelling but it still has a lot of tweaking to get right. p2pkgs uses a combination of a traditional package model - so far less patching is needed to build packages - it also is conceptually easier to understand than nix/hermes - while providing a large portion (but not all) of the benefits. You could consider p2pkgs like an exploratory search of ideas for ways to improve and simplify hermes. The optional p2p part was kind of an accident that seems to work so well in practice that I feel it is also important in it’s own way.

          1. 1

            while providing a large portion (but not all) of the benefits

            Could you possibly elaborate on which benefits are carried over, and which are not? I’m really interested in your explorations in this area of what I see as “attempts to simplify Nix”, but in this particular case, to the extent I managed to understand the repository, it’s currently very unclear to me what it really brings over just using redo to build whatever packages? Most notably, the core benefits I see in Nix (vs. other/older package managers), seem to be “capturing complete input state” of a build (a.k.a. pure/deterministic build environment), “perfectly clean uninstalls”, and “deterministic dependencies” including the possibility of packages depending on different versions of helper package. Does p2pkgs have any/all of those? It’s ok if not, I understand that this is just a personal exploration! Just would like to try and understand what’s going on there :)

            1. 2

              seem to be “capturing complete input state” of a build (a.k.a. pure/deterministic build environment)

              Yes it does, builds are performed in an isolated sandbox and use none of the host system.

              “perfectly clean uninstalls”, and “deterministic dependencies” including the possibility of packages depending on different versions of helper package.

              Packages are currently used via something I called a venv, this is more like nix shell, so has clean uninstalls - each venv can use different versions of packages, but within a venv you cannot - this is one of the downsides.

              it’s currently very unclear to me what it really brings over just using redo to build whatever packages?

              It uses redo + isolated build sandboxes + hashing of the dependency tree in order to provide transparent build caching, this is not so far removed from NixOS, which is why i feel nixos might be over engineered.

              One thing p2pkgs does not have is atomic upgrades/rollback unless it is paired with something like docker.

              All that being said, I think i oversimplified it to the point where the UX is not as good as it should be, so i hope to shift it back a bit to look a bit more like the nix cli - I think that will make things more clear.

              1. 1

                Thanks! I was confused how redo works (had some wrong assumptions); now I start to understand that the main entry point (or, core logic) seems to be in the pkg/default.pkg.tar.gz.do file. I’ll try to look more into it, though at a first glance it doesn’t seem super trivial to me yet.

                As to venv vs. NixOS, does “a linux user container” mean some extra abstraction layer?

                Also, I don’t really understand “container with top level directories substituted for those in the requested packages” too well: is it some kind of overlayed or merged filesystem, where binaries running in the container see some extra stuff over the “host’s” filesystem? If yes, where can I read more about the exact semantics? If not, then what does it mean?

                Back to “input completeness”: could you help me understand how/where can I exactly see/verify that e.g. a specific Linux kernel version was used to build a particular output? similarly, that a specific set of env variables was used? that a specific hash of a source tarball was used? (Or, can clearly see that changing one of those will result in a different output?) Please note I don’t mean this as an attack; rather still trying to understand better what am I looking at, and also hoping that the “simplicity” goal would maybe mean it’s indeed simple enough that I could inspect and audit those properties myself.

                1. 1

                  As to venv vs. NixOS, does “a linux user container” mean some extra abstraction layer?

                  Like Nixos, it uses containers to build packages, they are not needed to use packages - but they are helpful

                  Also, I don’t really understand “container with top level directories substituted for those in the requested packages” too well: is it some kind of overlayed or merged filesystem, where binaries running in the container see some extra stuff over the “host’s” filesystem? If yes, where can I read more about the exact semantics? If not, then what does it mean?

                  The build inputs are basically put into a chroot with the host system /dev/ added with a bind mount - this is quite similar to nixos - You can see it in default.pkg.tar.do

                  Back to “input completeness”: could you help me understand how/where can I exactly see/verify that e.g. a specific Linux kernel version was used to build a particular output?

                  Nixpkgs does not control the build kernel, not sure why you seem to think it does. Regardless - You can run redo pkg/.pkghash to compute the identity of a given package - which is the hash of all the build inputs including build scripts - again, much like nix. I suppose to control the build kernel we could use qemu instead of bwrap to perform the build. To see the inputs for a build you can also inspect the .bclosure file which is the build closure.

                  similarly, that a specific set of env variables was used? that a specific hash of a source tarball was used?

                  Env variables are cleared - this can be seen by the invocation of bwrap which is a container - much like nixpkgs. I get the impression you might be misunderstanding the trust model of NixOS - nixos lets you run package builds yourself - but it still relies on signatures/https/trust for the binary package cache - You can’t go back from a given store path and workout the inputs - you can only go forward from an input to verify a store path.

                  also hoping that the “simplicity” goal would maybe mean it’s indeed simple enough that I could inspect and audit those properties myself.

                  The entire implementation is probably less than 700 lines of shell, i think you should be able to read them all - especially default.pkghash.do and default.pkg.tar.gz.do .

                  1. 1

                    Thank you for your patience and bearing with me! I certainly might misunderstand some things from NixOS/Nixpkgs - I guess it’s part of the allure of p2pkgs that their simplicity may make those things easier to understand :) Though also part of my problem is that I’m having trouble expressing some things I’m thinking about here in precise terms, so I’d be super grateful if you’d still fancy having some more patience to me trying to continue searching for better precision of expression! And sorry if they’re still confused or not precise enough…

                    Like Nixos, it uses containers to build packages, they are not needed to use packages - but they are helpful

                    Hm; so does it mean I can run a p2pkgs build output outside venv? In Nixpkgs, AFAIU, this typically requires patchelf to have been run & things like make-wrapper (or what’s the name, I seem to never be able to remember it correctly). (How) does p2pkgs solve/approach this? Or did I misunderstand your answer here?

                    The build inputs are basically put into a chroot with the host system /dev/ added with a bind mount - this is quite similar to nixos - You can see it in default.pkg.tar.do

                    What I was asking here about was the “Running packages in venv” section - that’s where the “container with top level directories substituted (…)” sentence is used in p2pkgs readme. In other words: I’m trying to understand how during runtime any “runtime filesystem dependencies” (shared libraries, etc.; IIRC that’d be buildInputs in nixpkgs parlance) are merged with “host filesystem”. I tried reading bwrap’s docs in their repo, but either I couldn’t find the ultimate reference manual, or they’re just heavily underdocumented as to precise details, or they operate on some implicit assumptions (vs. chroot? or what?) that I don’t have.

                    In other words: IIUC (do I?), p2pkgs puts various FHS files in the final .pkg.tar.gz, which do then get substituted in the chroot when run with venv (that’s the way buildInputs would be made available to the final build output binary, no?). For some directory $X present in .pkg.tar.gz, what would happen if I wanted to use the output binary (say, vim), run via venv, to read and write a file in $X on host machine? How does the mechanism work that would decide whether a read(3) sees bytes from $X/foo/bar packed in .pkg.tar.gz vs. $X/foo/baz on host machine’s filesystem? Or, where would bytes passed to write(3) land? I didn’t manage to find answer to such question in bwrap’s docs that I found till now.

                    Do I still misunderstand something or miss some crucial information here?

                    Nixpkgs does not control the build kernel, not sure why you seem to think it does. (…)

                    Right. I now realize that actually in theory the Linux kernel ABI is stable, so I believe what I’m actually interested here in is libc. I now presume I can be sure of that, because the seed image contains gcc and musl (which I currently need to trust you on, yes?), is that so?

                    Env variables are cleared (…)

                    Ah, right: and then any explicitly set env vars result in build script changes, and then because it’s hashed for bclosure (or closure, don’t remember now), which is also included in (b?)closures of all dependees, the final (b?)closure depends on env vars. Cool, thanks!!

                    1. 1

                      Hm; so does it mean I can run a p2pkgs build output outside venv? In Nixpkgs, AFAIU, this typically requires patchelf to have been run & things like make-wrapper (or what’s the name, I seem to never be able to remember it correctly). (How) does p2pkgs solve/approach this? Or did I misunderstand your answer here?

                      It replaces /bin /lib but keeps the rest of the host filesystem when you run the equivalent to a nix shell. This seems to work fine and lets you run programs against the host filesystem. This works because on modern linux kernels you can create containers and do bind mounts without root.

                      If we designed a package installer tool (and a distro?), it would also be possible to just install them like an alpine linux package.

                      I now presume I can be sure of that, because the seed image contains gcc and musl (which I currently need to trust you on, yes?), is that so?

                      You can rebuld the seed image using the package tree itself, the seed image is reproducible so you can check the output seed is the same as the input seed. You need to trust me initially though before you produce your own seed.

                      Ah, right: and then any explicitly set env vars result in build script changes, and then because it’s hashed for bclosure (or closure, don’t remember now), which is also included in (b?)closures of all dependees, the final (b?)closure depends on env vars. Cool, thanks!!

                      Thats right :).

        2. 1

          Some time ago I was working on designing and implementing esy which is a package manager + meta build system (invoking package specific build system in hermetic environments) for compiled languages (OCaml/Reason/C/C++/…). It looks like Nix but has integrated SAT solver for solving deps, we rely on package.json metadata and npm as a registry of package sources (though we can install from git repos as well).

          I feel like it should be possible to use the same solver ideas or something like go MVS in order to make a distributed package tree - this is another idea I really want to try to integrate in something simpler than Nix. I agree that it seems like a great thing and I definitely want it to be built.

          edit: I will investigate esy more - it definitely has most of what I want - The big difference seems to be how p2pkgs simply overrides / using user containers and installs them using DESTDIR.

          1. 1

            I feel like it should be possible to use the same solver ideas or something like go MVS in order to make a distributed package tree

            The depsolver is an interesting beast. I’m not satisfied with how it ended up in esy (though we had constraints to operate within, see below) — the feature I miss the most is the ability to create a separate “dependency graph” for packages which only expose executables (you don’t link them into some other apps) — dependencies from those packages shouldn’t impose constraints outside their own “dependency graphs”.

            Ideally there should be some “calculus of package dependencies” developed which could be used as an interface between depsolver and a “metabuildsystem”. That way the same depsolver could be used with nix/hermes/esy/… Not sure how doable it is though — people don’t like specifying dependencies properly but then they don’t like to have their builds broken either!

            edit: I will investigate esy more - it definitely has most of what I want - The big difference seems to be how p2pkgs simply overrides / using user containers and installs them using DESTDIR.

            Keep in mind that we had our own set of constrains/goals to meet:

            • esy is usable on Linux/macOS/Windows (for example we ship Cygwin on Windows, this is transparent to the users)
            • esy uses npm as a primary package registry (appeal to people who know how to publish things to npm, an open world approach to managing packages)
            • esy doesn’t require administrator access to be installed/used (the built artefacts are inside the home directory)
            • esy strives to be compatible with OCaml ecosystem thus the depsolver is compatible with opam constraints and esy can install packages from opam
            1. 1

              The depsolver is an interesting beast. I’m not satisfied with how it ended up in esy (though we had constraints to operate within, see below) — the feature I miss the most is the ability to create a separate “dependency graph” for packages which only expose executables (you don’t link them into some other apps) — dependencies from those packages shouldn’t impose constraints outside their own “dependency graphs”.

              This is much like in a general package tree - statically linked programs really don’t care - but some programs don’t support static linking, or provide dynamic libraries.

              Another challenge is when you don’t have a monolithic repository of all packages you now have double the versioning problems to tackle - Each version is really two - the package version and the packaged software version.

              My goals for a general package tree are currently:

              • Linux only (it worked for docker).
              • Doesn’t require administrator for building or using packages.
              • Allows ‘out of tree packages’ or combining multiple package trees.
              • Transparent global build caching from trusted sources (like nixos, I don’t think esy has this).