Threads for paulsmith

    1. 1

      I’ve just now taken an interest in learning how LLMs work and I’m thankful that you shared this! To those more in the know: how are such mechanisms like attention and neural network layers usually implemented? All of the learning materials discuss these concepts by showing drawings of boxes with arrows going every which way, but I’ve not once seen mentioned where someone could actually read an existing implementation to find out how it is actually laid out. Also, is a model “just” an executable on disk with inputs and outputs or is it more involved?

      1. 2

        Sorry for not answering your questions directly, but since you’re just starting out and expressing an interest in how things work, I would refer you to Andrej Karpathy’s YouTube channel, in particular this playlist (the first video of which is probably the single best resource for the average programmer to understand the fundamental building blocks of all deep learning).

    2. 3

      I don’t understand why people have negative thoughts about using debuggers. There isn’t a “tests or debugger” dichotomy, and there’s no reason not to use both. They’re different tools for solving different problems.

      I can’t imagine getting up to speed and making changes on a large code base without using a debugger.

      1. 2

        Probably because raw gdb and lld are a pain to use. Unless you adopt a full fledged IDE to do your work there are hardly any good standalone UX debuggers for linux.

        1. 2

          I’m keeping my eye on uscope which looks promising https://calabro.io/uscope

          1. 1

            I think I tried it a couple of weeks ago and it wasn’t usable in wayland with some pretty bad windowing bugs. :(

        2. 2

          Yeah, I’d agree with that.

          I generally prefer developing on Linux, but I have to admit Visual Studio’s C++ debugger is a lot easier to work with. I’m not a fan of the UI hangs and questionable default settings, but it’s quick and easy to use and the learning curve is much gentler than GDB’s.

    3. 3

      I have to say I’m a little surprised and disappointed that no one here came to stick up for Julius and say that C does have a virtual (fine, an abstract) machine … ;^)

    4. 6

      Has anyone used structured outputs using locally-hosted LLMs? That’s one thing that would see me switch pretty quickly.

      1. 3

        Llama.cpp has has this for a long while now. You can just define a grammar and force it to output e.g. JSON, no need for examples

          1. 5

            While it is an impressive feature, I would just caution that, unless you’re sticking with JSON/JSON schema, arbitrary other grammars may perform less well than you might hope. The way it works is pretty clever, it takes the logits of the last output layer and keeps sampling until it finds a token that matches the next state transition in the grammar, and then since LLMs are autoregressive, that new token is appended to the output string and fed back to the model for the next one and so on, like normal. The problem is that the weights of the model were trained with a feedback loop of some objective function where the output was not a string in that grammar. So in a sense it’s sampling from a latent space that’s slightly different than what the model was tuned for. I optimistically thought I could coax the equivalent of EBNF of common PLs, but struggled to get meaningful results (at best syntactically correct, but no where near the semantics of my prompt). YMMV of course, and JSON does perform well since that’s now part of the training of most models.

      2. 2

        You mean JSON? Llama 3.1 8B does fine for it. You should ideally give it examples of input to outputs though. Especially Llama, that one’s almost hopeless without examples, they’re more important than the instructional part of the prompt.

    5. 5

      For the UI I like Ollama. It’s automates all the hassle of downloading and updating of models, has a built-in JSON API, and a terminal chat interface.

      Models that fit in 8-16GB of RAM are shallow and hallucinate a lot. 32GB MacBook can run mid-sized models, but barely. 48GB is needed to run a useful model and have other applications open at the same time.

      1. 1

        I’ve been idling thinking about picking up one of the new Mac minis with an M4 Pro and 48GB RAM, which is $1799US and hanging it off my network (plus Wireguard for roaming access) as a dedicated local/private inference machine, extending the life of my old laptop.

        1. 3

          I’ve got M3 Max, and fan noise gets annoyingly loud when running LLMs for longer than a minute (chat is okay, but batch processing is maxing out the heat). For my next Mac I’m going to test whether Studio’s beefy cooler is quieter under load.

      2. 1

        I have also found Ollama useful but I only use to run the models. It loads them on demand and unloads after a configured timeout (something like 5 minutes by default?). Since switching to it from llama.cpp I’ve found it convenient enough to run local models daily. Naturally, on Linux the Nvidia drivers of course have to be restarted with some modprobe calls after when the machine wakes from sleep. I made an shell alias for it :)

        As an UI I’ve found Continue.dev, a VSCode extension, to work well enough to be useful when programming. It’s super janky with the Vim bindings though but at least I can select code, send it as context to the chat pane via a few mouse clicks, ask for edits and copy the results back with another click.

    6. 4

      The Go standard library uses panic/recover as a control flow mechanism in parser code for example.

      I do the same thing in the Pushup parser. With a recursive descent parser, since you’re using the host language’s call stack as a data structure, it’s much simpler to panic with a syntax error in a controlled way than mix “regular” error values with syntax errors and manually bubble them up.

      1. 11

        This depends on use-case, but for langauge-tooling parsers, what I often do is treating syntax errors not like host-level errors, but as normal domain objects. So, a parse function produces a value and a list of errors. So, the “failing” function just consumes some input and pushes the error to the list of errors. And then some top-level while not end-of-file loop bails naturally when the all the input is consumed.

        1. 4

          Make sense, to not have syntax errors affect control flow while parsing (at least in the sense of whether to exit parsing early). That seems a more modern way of parsing especially as you mention due to language tools. Will look into that for Pushup, since an LSP is on the horizon.

          1. 8

            If you are looking into LSP, then let me plug https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html as well, as it sounds like it could be useful here!

    7. 1

      Excellent post, I learned a lot about CSS from your explanations!

      Curious why not use display: grid for the grids?

      1. 2

        Thanks! I’m glad you enjoyed it. In this case I wanted only that single utility class that just works, i.e. evenly dividing up the space (except the remainder). So here display: flex; sufficed. For more user-level control, display: grid; might be worth using. But yeah, overkill for this demo.

    8. 5

      You might want to avoid io_uring if[…] in particular you want to use features with a good security track record.

      I admit to being somewhat surprised to read this. Can you elaborate on this? Are you saying io_uring doesn’t have a good security track record because of its general newness, so is like other novel features that just haven’t had their time in the sun being battle-hardened, or have there been flaws or exploits already that should give us pause?

      1. 3

        Previously on Lobsters… https://lobste.rs/s/wh2oze/put_io_uring_on_it_exploiting_linux_kernel

        io_uring doesn’t have a good security track record because of its general newness, … or have there been flaws or exploits already that should give us pause?

        Both, I guess.

    9. 6

      After a few years of using Nix as a package manager on my macOS devices, I felt confident enough to jump into the deep end last fall and use flaked-based nix-darwin and home-manager to manage as much of the overall OS as possible. It’s been fantastic. I’ve never gotten wedged/into a weird state after running a darwin-rebuild switch, and have successfully managed two different machines with one config. There are some issues with macOS compatibility for a few nixpkgs, but usually in those cases I just install via Brew, which home-manager lets me include in the config. Having a declarative Brew config for those small number of cases is sort of the best of both worlds. Honestly, I’m surprised it’s lasted this long, I was prepared to pull the rip cord at all time if it became too much, but the setup has been really stable and productive. I’ve got a few Linux machines I’ll be bringing under this same config soon. Here’s my config: https://github.com/paulsmith/nixos-config I’m sure there are more efficient and more Nix-ish ways to do things, I’ve only just learned enough to make things work. And I still reach for nix flake init for a local devShell for all new project development.

      1. 3

        This is basically where I’m at (albeit my repo isn’t yet public). Nix where it makes sense, homebrew where it doesn’t. Two macOS machines managed, both stay in sync and haven’t had many issues (most of my issues are updating nixpkgs input and then something fails to compile. I just wait it out a few more weeks before updating it again.)

    10. 3

      At least s6 uses a somewhat similar config format where contents of files are used without parsing

      1. 2

        Yup I first saw it in daemontools! DJB is against parsing :) And s6 is of that family of process supervisors

        1. 2

          I’ve used Runit (similarly daemontools inspired) as part of many deployments, and its command chpst -e provides this functionality. It was initially a bit odd compared to a .rc or .json or .yaml, but operationally it’s very simple and foolproof. From the manpage:

          envdir. Set various environment variables as specified by files in the directory dir: If dir contains a file named k whose first line is v, chpst removes the environment variable k if it exists, and then adds the environment variable k with the value v. The name k must not contain =. Spaces and tabs at the end of v are removed, and nulls in v are changed to newlines. If the file k is empty (0 bytes long), chpst removes the environment variable k if it exists, without adding a new variable.

    11. 13

      I had a similar experience with my small programming language: managing the state of tree-walking execution is much more difficult than managing the state of a small sequential-execution stack machine, and generating bytecode from an AST was easier than I thought it would be.

      1. 2

        This is also been my experience. The only “hard” thing about bytecode IMO is having to do things like patch up jump locations eg for control flow - it just takes a little extra bookkeeping compared to the rest of the interpreter, whereas if you are walking the AST the consequent or alternative code to execute after testing the conditional in an if statement is right there as a pointer in the node.

    12. 1

      If the post’s author reads this, I think there’s a typo:

      On the surface, the call to rope.offset_to_point(30)

      Based on the context, I think it should be:

      On the surface, the call to rope.offset_to_point(26)

      1. 3

        Thanks for flagging!

        A correction is being deployed now.

        1. 2

          Also, the very first example seems incorrect, missing a space probably? I would expect: "Hello World!" + "This is your captain speaking." to result in: "Hello World!This is your captain speaking.", whereas you present: "Hello World! This is your captain speaking.". Or is there some functionality of adding an extra " " built-in? 🤔

          (Otherwise, an awesome article, thanks! 💖)

    13. 3

      The only thing I don’t like about HTMX is how it requires individual HTML fragments be accessible as responses to random requests.

      Somehow this overloads in a very unpleasant way my assumption that a good URL is one that gives you a fully formed resource. It might just be a matter of using it more though.

      1. 4

        When a request is done via HTMX, it has a specific header. With django-htmx ( https://django-htmx.readthedocs.io/en/latest/ ), that header is translated to a field on the Django request object: request.htmx.

        Usually, your route will return the full page, unless request.htmx is true (aka: the HTMX header is set), then it returns the fragment.

        This is fine, because your templates are mostly organized as:

        <!-- resource-full.html -->
        <html>
          <head>...</head>
          <body>{% include "resource-fragment.html" %}</body>
        </html>
        

        For more information see:

        1. 1

          Thank you for the clarifications, but this also seems to violate the tenet I mentioned: one URL -> one Resource.

          1. 8

            It’s still the same resource. With different presentations based on a request header.

            It would be like:

            • http://example.com/collection/resource?format=json
            • http://example.com/collection/resource?format=yaml
            1. 2

              I think I got your and u/mxey’s post mixed up in my head and replied to a mix of both.

          2. 3

            You might be interested in Pushup, which has a notion of a “partial”, which essentially makes a fragment of a page independently routable (aka gives it its own URL)

            https://pushup.adhoc.dev/docs/syntax#:~:text=%5Epartial

      2. 3

        I think I see what you mean, but I think you might be conflating a page with a resource. Would a CSS file be a fully formed resource in your book? Or an image used for a part of some decoration? Fielding’s thesis §5.2.1.1 just defines a resource as:

        Any information that can be named can be a resource: a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and so on.

        So, yeah; a small-ish chunk of a page can be a resource, and as others have said you can Vary: the response based on the headers that HTMX sets.

        I definitely understand your discomfort though; it does feel a bit wierd. Personally, prefer template engines where you can embed your fragments in a full HTML file so you can just preview them in a browser, without having to run your application. Sadly, this just means I’m frequently dissapointed.

    14. 1

      I really like this approach! I had a similar notion of avoiding template languages with Pushup which was to start with HTML and let the programmer switch to Go code for control flow and variable access, etc. It all compiles down to pure Go code in the end.

      What I like about JSX-Lua here is a dynamic language may be more pleasant to author web pages in than Go with this general approach.

      https://pushup.adhoc.dev

    15. 6

      So we have JSX-like language support for:

      Any others?

      1. 3

        Technically Elixir has something similar to that thanks to sigil macros and HEEx.

      2. 2

        Pushup https://pushup.adhoc.dev is not JSX exactly, but it’s HTML-first blended with Go for control flow and variable access. (I’m the Pushup creator.)

        1. 1

          Thanks. Not sure what to think about the syntax though. Why not make it like JSX instead? Would also make it easier to port tooling and editor support to.

    16. 4

      Lovely painless upgrade! sudo nix-channel --add https://channels.nixos.org/nixos-23.11 nixos && sudo nixos-rebuild boot --show-trace --upgrade-all took maybe 10 minutes on my work laptop, and everything Just Works™ after a reboot.

      Update: Time to upgrade my desktop PC - the build took almost an hour! Granted, it installs a few bits more, but ouch.

      1. 13

        Feels even better when you make it happen on tens of cloud VMs all at once. With one commit.

        1. 1

          Can it be done without executing nix-channel --add, just by configuration change?

          1. 9

            Yes, in your configuration.nix you can do:

              system.autoUpgrade = {
                enable = true;
                allowReboot = true;
                channel = "https://nixos.org/channels/nixos-23.11-small";
              };
            
          2. 3

            On my end, all my cloud stuff and personal stuff uses a single flake repo for configuration and such.

            So for me, the upgrade was a one line patch (changing the nix flake input to 23.11), then having the CI auto-update the flake locks, build the update and push it out into the world. My desktop doesn’t have auto-update setup though, since it messes with bluetooth a bit.

            I can recommend diving into flakes, with a good CI setup it makes upgrades like this a breeze.

          3. 1

            I’m using a custom system with a daemon that makes hosts auto-update themselves to a desired configuration. https://github.com/rustshop/npcnix

      2. 2

        Same painless-ness on my flake-based nix-darwin setup with home-manager. Changed two lines in my flake.nix (input.nixpkgs.url and inputs.home-manager.url edited from 23.05 to 23.11 as appropriate) and then darwin-rebuild switch --flake . was all it took. This is on a M1 Pro running macOS Sonoma 14.1.1.

    17. 5

      It’s essentially a shell command hook to run prior to diff which is nice and flexible - curious, other than the example here of db diffs what other binary file formats folks have seen in the wild using this or some creative examples that are possible?

      1. 11

        I’ve used this before to transparently decrypt and diff ansible vault secrets. And I’ve seen it used to diff (text extracted from) PDFs.

        1. 3

          Oh blimey, I have a rant about ansible vault, in the form of a few thousand lines of code that I regret every part of. https://dotat.at/prog/regpg/doc/rationale.html

          The short version is that a good secret should have about 100 bits of entropy. There’s no reason a person should ever need to look at a decrypted secret: there’s nothing useful to see, just random bits. If secrets are properly separated from their metadata, the secrets can be kept encrypted and the metadata can be left in the clear. Then there is never any need for transparently (and carelessly) decrypting secrets for trivial reasons.

        2. 3

          Yep. Other examples include office files (Word, Excel, etc).

      2. 2

        Long ago I hooked it up to the imagemagick compare utility, so changes to image files could be diff-ed by opening a window with the image, highlighting the changed pixels.

        I switched it off later because it meant that eg git log would randomly spawn new windows, but it was pretty handy aside from that annoyance.

        I’ve also hooked it up to (sops)[https://github.com/mozilla/sops] to decrypt and diff secrets.

    18. 7

      Slightly OT: several people have noted that at the end of the video the credits say it was shot entirely on iPhone, which is pretty notable when you think about “for broadcast” productions like these require $10,000+ studio-grade cameras. The fact that a consumer smartphone lets you shoot 4k video that can be color-graded and post-processed like in “serious” workflows is pretty remarkable. This video I thought was a good technical explainer about the ProRes Log mode in the latest iPhones that makes this possible.

      1. 8

        Apple has now posted an explainer piece to cross-promote.

    19. 2

      Thanks for bringing quality improvements to the Nix ecosystem! I personally haven’t noticed any problems across macos upgrades yet.

      Was I just lucky? Shall I migrate to the Nix installer? Can I migrate to it or do I need to do a clean install?

      1. 1

        I was about to ask a similar question - I just installed Nix via the Determinate Systems installer a few weeks ago. Is there an upgrade path to this new one, or is it uninstall/reinstall?

        (Very exciting, BTW.)

        1. 3

          Some users don’t have this issue. For example, I think nix-darwin has some tricks up its sleeve to avoid it.

          If you’d like to switch, you’ll need to do a full removal and then reinstallation. Unfortunately as jvns noted that uninstall can be tricky with the upstream installer… but once you’ve migrated our installer makes it trivial to remove. This is also why we don’t currently support “adopting” an existing Nix install. We’ve done a lot of engineering to make the install revertable safely and thoroughly, and we don’t want to get it wrong and spoil someone’s day.

      1. 10

        I’ve got a bunch of questions, but I guess a few things (appreciating that from a career standpoint you may not be able to answer):

        • How did such a gaggle of contractors occur?
        • At any point did these people realize they kinda sucked, technically? Did they kinda suck?
        • Did anything end up happening to the folks that bungled the launch? Were there any actual monetary consequences?

        (I can guess as to most of these, given the usual way of things, but figured I might as well ask.)

        More on the tech side:

        • What sort of load were y’all looking at?
        • What was the distribution of devices, if you remember?
        • What were some of the most arcane pieces of tech involved in the project?

        Thanks for posting here, and thanks for helping right the ship!

        1. 21

          How did such a gaggle of contractors occur?

          Sorry in advance for the long-winded context-setting, but it’s the only way I know how to answer this question.

          There are a few important things to understand. First, even though HealthCare.gov was owned by CMS, there was no one single product owner within CMS. Multiple offices at CMS were responsible for different aspects of the site. The Office of Communications was responsible for the graphic design and management of the static content of the site, things like the homepage and the FAQ. The Center for Consumer Information and Insurance Oversight was responsible for most of the business logic and rules around eligibility, and management of health plans. The Office of Information Technology owned the hosting and things like release management. And so on. Each of these offices has their own ability to issue contracts and set of vendors they prefer to work with.

          Second, HealthCare.gov was part of a federal health marketplace ecosystem. States integrated with their own marketplaces and their Medicaid populations. Something called the Digital Services Hub (DSH) connected HealthCare.gov to the states and to other federal agencies like IRS, DHS, and Social Security, for various database checks during signup. The DSH was its own separately procured and contracted project. An inspector general report I saw said there were over 60 contracts comprising HealthCare.gov. Lead contractors typically themselves subcontract out much of the work, increasing the number of companies involved considerably.

          Then you have the request for proposals (RFP) process. RFPs have lists of requirements that bidding contractors will fulfill. Requirements come from the program offices who want something built. They try to anticipate everything needed in advance. This is the classic waterfall-style approach. I won’t belabor the reasons this tends not to work for software development. This kind of RFP rewards responses that state how they will go about dutifully completing the requirements they’ve been given. Responding to an RFP is usually a written assertion of your past performance and what you claim to be able to do. Something like a design challenge, where bidding vendors are given a task to prove their technical bona fides in a simulated context, while somewhat common now, was unheard of when HealthCare.gov was being built.

          Now you have all these contractors and components, but they’re ostensibly for the same single thing. The government will then procure a kind of meta-contract, the systems integrator role, to manage the project of tying them all together. (CGI Federal, in our case.) They are not a “product owner”, a recognizable party accountable for performance or end-user experience. They are more like a general contractor managing subs.

          In addition, CMS required that all software developed in-house conform to a specified systems design, called the “CMS reference architecture”. The reference architecture mandated things like: the specific number of tiers or layers, including down to the level of reverse proxies; having a firewall between each layer; communication between layers had to use a message queue (typically a Java program) instead of say a TCP socket; and so forth. They had used it extensively for most of the enterprise software that ran CMS, and had many vendors and internal stakeholders that were used to it.

          Finally, government IT contracts tend to attract government IT contractors. The companies that bid on big RFPs like this are typically well-evolved to the contracting ecosystem. Even though it is ostensibly an open marketplace that anyone can bid on, the reality is that the federal goverment imposes a lot of constraints on businesses to be able to bid in the first place. Compliance, accreditation, clearance, accounting, are all part of it, as well as having strict requirements on rates and what you can pay your people. There’s also having access to certain “contracting vehicles”, or pre-chosen groups of companies that are the only ones allowed to bid on certain contracts. You tend to see the same businesses over and over again as a result. So when the time comes to do something different – eg., build a modern, retail-like web app that has a user experience more like consumer tech than traditional enterprise government services – the companies and talent you need that has that relevant experience probably aren’t in the procurement mix. And even if they were, if they walked in to a situation where the reference architecture was imposed on them and responsibility was fragmented across many teams, how likely would they be to do what they are good at?

          tl;dr:

          • No single product owner within CMS; responsibilities spread across multiple offices
          • Over 60 contracts just for HealthCare.gov, not including subcontractors
          • Classic waterfall-style RFP process; a lot of it designed in advance and overly complicated
          • Systems integrator role for coordinating contractors, but not owning end-user experience
          • CMS mandated a rigid reference architecture
          • Government IT contracts favor specialized government contractors, not necessarily best-suited for the job

          At any point did these people realize they kinda sucked, technically? Did they kinda suck?

          I think they saw quickly from the kind of experience, useful advice, and sense of calm confidence we brought – based on knowing what a big transactional public-facing web app is supposed to look like – that they had basically designed the wrong thing (the enterprise software style vs a modern digital service) and had been proceeding on the fairly impossible task of executing from a flawed conception. We were able to help them get some quick early wins with relatively simple operational fixes, because we had a mental model of what an app that’s handling high throughput with low latency should be, and once monitoring was in place, going after the things that were the biggest variance from that model. For example, a simple early thing we did that had a big impact was configuring db connections to more quickly recycle themselves back into the pool; they had gone with a default timeout that kept the connection open long after the request had been served, and since the app wasn’t designed to stuff more queries into an already-open connection, it simply starved itself of available threads. Predictability, clearing this bottleneck let more demand flow more quickly through the system, revealing further pathologies. Rinse and repeat.

          Were there poor performers there? Of course. No different than most organizations. But we met and worked with plenty of folks who knew their stuff. Most of them just didn’t have the specific experience needed to succeed. If you dropped me into a mission-critical firmware project with a tight timeline, I’d probably flail, too. For the most part, folks there were relieved to work with folks like us and get a little wind at their backs. Hard having to hear that you are failing at your job in the news every day.

          Did anything end up happening to the folks that bungled the launch? Were there any actual monetary consequences?

          You can easily research this, but I’ll just say that it’s actually very hard to meaningfully penalize a contractor for poor performance. It’s a pretty tightly regulated aspect of contracting, and if you do it can be vigorously protested. Most of the companies are still winning contracts today.

          What sort of load were y’all looking at?

          I would often joke on the rescue that for all HealthCare.gov’s notoriety, it probably wasn’t even a top-1k or even 10k site in terms of traffic. (It shouldn’t be this hard, IOW.) Rough order of mag, peak traffic was around 1,000 rps, IIRC.

          What was the distribution of devices, if you remember?

          You could navigate the site on your phone if you absolutely had to, but it was pretty early mobile-responsive layout days for government, and given the complexity of the application and the UI for picking a plan, most people had to use a desktop to enroll.

          What were some of the most arcane pieces of tech involved in the project?

          Great question. I remember being on a conference bridge late in the rescue, talking with someone managing an external dependency that we were seeing increasing latency from. You have to realize, this was the first time many government agencies where doing things we’d recognize as 3rd party API calls, exposing their internal dbs and systems for outside requests. This guy was a bit of a grizzled old mainframer, and he was game to try to figure it out with us. At one point he said something to the effect of, “I think we just need more MIPS!” As in million instructions per second. And I think he literally went to a closet somewhere, grabbed a board with additional compute and hot-swapped it into the rig. It did the trick.

          In general I would say that the experience, having come from what was more typical in the consumer web dev world at the time - Python and Ruby frameworks, AWS EC2, 3-tier architecture, RDBMSes and memcached, etc. - was like a bizarro world of tech. There were vendors, products, and services that I had either never heard of, or were being put to odd uses. Mostly this was the enterprise legacy, but for example, code-generating large portions of the site from UML diagrams: I was aware that had been a thing in some contexts in, say, 2002, but, yeah, wow.

          To me the biggest sin, and I mean this with no disrespect to the people there who I got to know and were lovely and good at what they did, was the choice of MarkLogic as the core database. Nobody I knew had heard of it, which is not necessarily what you want from what you hope is the most boring and easily-serviced part of your stack. This was a choice with huge ramifications up and down, from app design to hardware allocation. An under-engineered data model and a MySQL cluster could easily have served HealthCare.gov’s needs. (I know this, because we built that (s/MySQL/PostgreSQL) and it’s been powering HealthCare.gov since 2016.)

          1. 4

            was the choice of MarkLogic as the core database

            A few years after your story I was on an airplane sitting next to a manager at MarkLogic. He said the big selling point for their product is the fact it’s a NoSQL database “Like Mongo” that has gone through the rigorous testing and access controls necessary to allow it to be used on [government and military jobs or something to that effect] “like only Oracle before us”.

            1. 4

              He’s probably referring to getting FedRAMP certified or similar, which can be a limiting factor on what technologies agencies can use. Agencies are still free to make other choices (making an “acceptable risk” decision).

              In HealthCare.gov’s case, the question wasn’t what database can they use, but what database makes the most sense for the problem at hand and whether a document db was the right type of database for the application. I think there’s lots of evidence that, for data modeling, data integrity, and operational reasons, it wasn’t. But the procurement process led the technology choices, rather than the other way round.

        2. 5

          I’m not Paul, but I recall hearing Paul talk about it one time when I was his employee, and one of the problems was MarkLogic, the XML database mentioned in the post. It just wasn’t set up to scale and became a huge bottleneck.

          Paul also has a blog post complaining about rules engines.

          See also Paul’s appearance on the Go Time podcast: https://changelog.com/gotime/262

      2. 5

        I’m happy to see that you feel proud of your work. It’s an incredibly rare opportunity to be able to help so many people at such a large scale.

        Have you found something else as meaningful for you since then?

        Also what’s your favorite food?

        1. 7

          Thank you!

          Have you found something else as meaningful for you since then?

          Yes, my work at Ad Hoc - we started the company soon after the rescue and we’ve been working on HealthCare.gov pretty much ever since. We’ve also expanded to work on other things at CMS like the Medicare plan finder, and to the Department of Veterans Affairs where we rebuilt va.gov and launched their flagship mobile app. And we have other customers like NASA and the Library of Congress. Nothing will be like the rescue because of how unique the circumstances, but starting and growing a company can be equally as intense. And now the meaning is found in not just rescuing something but building something the right way and being a good steward to it over time so that it can be boring and dependable.

          Also what’s your favorite food?

          A chicken shawarma I had at Max’s Kosher Café (now closed 😔) in Silver Spring, MD.

          1. 3

            Can you please kill X12. That is all.

            This is only slightly sarcastic. I’m hopeful with your successful streak you can target the EDI process itself, because it’s awful. I only worked with hospice claims, but CMS certainly did a lot to make things complicated over the years. I only think we kept up because I had a well-built system in Ruby.

            It’s literally the worst thing in the world to debug and when you pair that with a magic black box VM that randomly slurps files from an NFS share, it gets dicey to make any changes at all.

            1. 11

              Seriously. ☠️

              For those not familiar, X12 is a standard for exchanging data that old-school industries like insurance and transportation use that well-predates modern niceties like JSON or XML.

              X12 is a … to call it a serialization is generous, I would describe it more like a context-sensitive stream of transactions where parsing is not a simple matter of syntax but is dependent what kind of X12 message you are handling. They can be deeply nested, with variable delimiters, require lengthy specifications to understand, and it’s all complicated further by versioning.

              On top of that, there is a whole set of X12-formatted document types that are used for specific applications. Relevant for our discussion, the 834, the Benefit Enrollment and Maintenance document, is used by the insurance industry to enroll, update, or terminate the coverage.

              To give you a little flavor of what these are like, here is the beginning of a fake 834 X12 doc:

              ISA*00*          *00*          *ZZ*CMSFFM         *ZZ*54631          *131001*1800*^*00501*000000844*1*P*:~
              GS*BE*NC0*11512NC0060024*20131001*1800*4975*X*005010X220A1~
              ST*834*6869*005010X220A1~
              BGN*00*6869*20131001*160730*ET***2~
              QTY*TO*1~
              QTY*DT*0~
              N1*P5*Paul Smith*FI*123456789~
              N1*IN*Blue Cross and Blue Shield of NC*FI*560894904~
              INS*Y*18*021*EC*A***AC~
              REF*0F*0000001001~
              REF*17*0000001001~
              REF*6O*NC00000003515~
              DTP*356*D8*20140101~
              NM1*IL*1*Smith*Paul****34*123456789~
              PER*IP**TE*3125551212*AP*7735551212~
              N3*123 FAKE ST~
              N4*ANYPLACE*NC*272885636**CY*37157~
              

              HealthCare.gov needed to generate 834 documents and send them to insurance carriers when people enrolled or changed coverage. Needless to say, this did not always go perfectly.

              I personally managed to avoid having to deal with 834s until the final weeks of 2013. An issue came up and we need to do some analysis across a batch of 834s. Time was of the essence, and there was basically no in-house tooling I could use to help - it only generated 834s, not consume them. So naturally I whipped up a quick parser. It didn’t have to be a comprehensive, fully X12-compliant parser, it only needed enough functionality to extract the fields needed to do the analysis.

              So I start asking around for anyone with 834 experience for help implementing my parser. They’re incredulous: it can’t be done, the standard is hundreds of pages long, besides you can’t see it (the standard), it costs hundreds of dollars for a license. (I emailed myself a copy of the PDF I found on someone’s computer.)

              I wrote my parser in Python and had the whole project done in a few days. You can see a copy of it here. The main trick I used was to maintain a stack of segments (delimited portions of an X12 doc) and the in the main loop would branch to methods corresponding to the ID of the current segment, allowing context-sensitive decisions to be made about what additional segments to pull off the stack.

              The lesson here is that knowing how to write a parser is a super power that will come in handy in a pinch, even if you’re not making a compiler.

      3. 3

        just wanted to give you props. I’m a contractor myself and I’ve been on both sides of this (I have been in the gaggle of contractors and been one of the people they send in to try to fix things) and neither side is easy, and I’ve never come close to working on anything as high profile as this and even those smaller things were stressful. IME contracting is interesting in some ways (I have worked literally everywhere in the stack and with tons of different languages and environments which I wouldn’t have had the opportunity to do otherwise) and really obnoxious in other ways (legacy systems are always going to exist but getting all the contractors and civilian employees pulling in the same direction is a pain, especially without strong leadership from the government side)

        1. 3

          Thanks, and back at ya. My post didn’t get into it, but it really was a problem of a lack of a single accountable product lead who was empowered to own the whole thing end-to-end. Combine that with vendors who were in a communications silo (we constantly heard from people who had never spoken to their counterparts other teams because the culture was not to talk directly except through the chain of command) and the result was, everyone off in their corner doing their own little thing, but no one with a view to the comprehensive whole.

      4. 3

        How helpful were non-technical metrics such as user feedback and bug reports to the process?

        1. 12

          Extremely. Aside from the performance-related problems that New Relic gave us a handle on, there were many logic bugs (lots of reasons for this: hurried development, lack of automated tests, missing functionality, poorly-understood and complicated business rules that were difficult to implement). This typically manifested in users “getting stuck” partway through somewhere, and they could neither proceed to enrollment or go back and try again.

          For example, we heard early on about a large number of “lost souls” - people who had completed one level of proving your identity (by answering questions about your past ala a credit check), but due to a runtime error in a notification system, they never got the link to go to the next level (like uploading a proof of ID document). Fixing this would involve not just fixing the notification system, but figuring out which users were stuck (arbitrary db queries) and coming up with a process to either notify them (one-off out-of-band batch email) or wipe their state clean and hope they come back and try again. Or if they completed their application for the tax credit, they were supposed to get an official “determination of eligibility” email and a generated PDF (this was one of those “requirements”). But sometimes the email didn’t get sent or the PDF generator tipped over (common). Stuck.

          The frontend was very chatty over Ajax, and we got a report that it was taking a long time for some people to get a response after they clicked on something. And we could correlate those users with huge latency spikes in the app on New Relic. Turns out for certain households, the app had a bug where it would continually append members of a household to a list over and over again without bound after every action. The state of the household would get marshalled back and forth over the wire and persisted to the db. Due to a lack of a database schema, nothing prevented this pathological behavior. We only noticed it because it was starting to impact user experience. We went looking in the data, and found households with hundreds of people in them. Any kind of bug report was just the tip of the iceberg of some larger system problem and eventual resolution.

          I have a good story about user feedback from a member of Congress live on TV that led to an epic bugfix. I’ve been meaning to write it up for a long while. This anniversary is giving me the spur to do it. I’ll blog about it soon.