Threads for ewintr

    1. 8

      This is a major reason why I decided to only use models that I can self-host on my own computer. LLM’s are flaky enough by themselves, I don’t want to constantly wonder whether OpenAI/Anthropic/Google/Whoever pulled a trick again to safe cost or because they decided they needed more censoring.

      The quality of opens models is less than the commercial ones, but on the plus side, I never have to be anxious about the price of processing a token.

    2. 7

      Will we always be forced to tolerate mediocrity in our work, at least on certain axes like efficiency, because we have to ship features now? I hope not.

      1. 31

        The work is delivering value to customers. Everything else is a distraction.

        1. 45

          I appreciate the spirit of this ethos, but I’m also currently dealing with a stomach bug from an undercooked restaurant meal yesterday. Sometimes shipping something right now is delivering the customer with negative value.

          1. 3

            Hah, point well made. Hope you feel better soon!

        2. 8

          The work is delivering value to customers. Everything else is a distraction.

          It is as simple as that. Except that it is not that simple a statement. What is ‘value’ here?

          Sure, you might be shipping new features faster now, but maybe it becomes so hard to stay disciplined that your code turns into spaghetti and next year it takes more time to add features. What if I am a customer that doesn’t want new features anyway, but really hates slow responses?

          Not saying that one way is better than the other. But it is never easy or straightforward.

        3. 2

          This line is always used to defend the status quo, yet it works just as much against the idea of sacrificing reliability/performance/features for dev time as it does against the idea of sacrificing dev time for all that.

          If anything, isn’t dev time a much more dev-focused benefit than customer-focused?

          Either way, an N% reliability/stability improvement, for example, is much more valuable to a customer than you releasing X new features that year instead of X - Y, but not X - Z, where Z > Y. And revolutionary new (beneficial) features that would’ve been impossible to write and/or maintain with an A-person team in the status quo stack more valuable to the customer than an extra B evolutionary features that year, but not an extra C, where C > B.

          Obviously, that maxim doesn’t tell us anything about what should be sacrificed for what — that’s contextual — yet it’s only ever used to defend sticking with the status quo to give up one set of advantages for another, that are no more customer-value-focused than the other.

          (Performance difference is maybe not so noticeable in most domains that we talk about if you write sensibly, regardless of the language or framework.)

          The problem isn’t that customers notice the benefits of sticking with the status quo more than the benefits of moving to a sufficiently improved stack, it’s that the customer is comparing you on combined net benefit of features, performance, reliability, etc. to your competitor — not to you in an alternate universe where you use a different stack — and features are marketed much more than reliability.

          So there’s often a race to the bottom probably because companies feel like they’re being more productive when they release more features, and so that’s how they measure it.

          Performance is often not as heavily marketed and probably isn’t that noticeable most of the time, but when customers do notice and a competitor comes out that they can easily switch to, they’ll happily sacrifice a few features, since each customer probably doesn’t use (a different) large chunk of features themselves, and even sacrifice a few non-critical ones they do use.

      2. 5

        Define “mediocrity”. A web application’s interface is the user interacting with it over a network. It’s not a native application, where efficient code is more important, and it’s not a software library, where terse code with few dependencies is more important. Rails seems like a fine choice in this context?

        1. 4

          As one example, look at the system requirements of ONCE Campfire, 37signals’s self-hosted chat web app. Requiring, or at least recommending, 8 GB of RAM and 4 vCPUs for 1000 concurrent users feels off by at least one order of magnitude. Tolerating mediocrity means knowing that such an app could probably run on a 2000s Linux-based home router and handle at least tens, if not hundreds, of concurrent users if it were built on a stack that prioritized efficiency. For context, Diversi-Dial could handle 7 concurrent (dial-up) users on an Apple IIe.

      3. 2

        Pretty solid chance at “yes”. You have to pick out tradeoffs. If you’re building a project by yourself or with really constrained development resources, it’s absolutely a choice that makes sense to pick up stuff like Rails as a tool. If that’s not something you’re willing to compromise on, in the same resource-constrained environment, it means that you’re cutting elsewhere. That’s not a bad thing necessarily. Just how it’s not necessarily a good thing. Sometimes “tolerating mediocrity” is fine. For what it’s worth, I don’t think Rails is mediocre.

      4. 2

        Rails may be less efficient than other approaches, but existing big projects are a proof it scales, albeit with more resources. So, you don’t have to tolerate it, it just needs more work and resources.

    3. 27

      If I need to put Linux on a computer for a non-enthusiast to use, I always go with a KDE desktop, and those people have been happy with it; familiar enough to get into, plenty functional. GNOME has an uphill battle to become a desktop for all…

      1. 14

        I’m an enthusiast and I’m very much liking Plasma 6.2.2. Such a great desktop.

      2. 10

        See, I find the opposite, most non-technical, everyday users prefer Gnome over KDE, especially since MacOS and Smartphone UIs being more popular these days.

        KDE might be windows-like, but it is too busy, too configurable for your average user. People care about apps, and getting things done.

        Gnome imitates the Apple/MacOS model and it’s working for them.

        Even i as a power user, prefer Gnome over KDE. It gets out of my way and lets me get stuff done. I never fight it, it just works, and its pleasant to look at. It’s calm and not eye straining.

        We really need to get away from this KDE vs Gnome war. People prefer different things, and I hate how both sides rag on each other constantly for not using the same UI as them. Let people use what they like. The can always switch later if their needs change.

        1. 19

          Gnome imitates the Apple/MacOS model and it’s working for them

          From my experience with GNOME, that imitation is superficial and often misses the point. Things I value on macOS, which have been there since at least 10.2 (some are 10+ years older, but I didn’t use the system back then, and some of them were on Classic MacOS, others on OPENSTEP):

          • Keyboard shortcuts are the same everywhere. Even in the terminal, command-c copies and command-v pastes. Command-, always brings up preferences. Keyboard navigation between UI elements works in an expected order. Text-navigation shortcuts are the same in every text field.
          • Menu layouts are consistent and everything is in menus and so discoverable. I can search for things in menus by typing in the help menu and it shows me the expanded menu, pointing to the item I want, so I can find it later.
          • Drag and drop just works, everywhere[1]. I expect to be able to select something in one app and drag it to another. Every document-driven app has a file proxy icon in the title bar, I can drag that to the terminal to paste the full path, drag it to an email to attach that file, and so on.
          • File choosers are uniform. If I want to save a file in a directory I have open in the Finder, I just drop that directory in the save dialog and it works.
          • Almost every app support sudden termination. I can install updates and the system just force quits every app and they all come back in the same state. This includes the Terminal, which will restore all of my windows and tabs, in the previous location, in the same directory, with the same UUID in an environment variable, which I use to make all of my ssh sessions transparently reconnect after reboot.
          • Closely related, autosaving is the default everywhere, respecting Raskin’s First Law. Losing data is a user action. If my machine crashes (which hasn’t happened for a while), I don’t lose any data.
          • The system provides a bunch of services that apps integrate with. Spell checking is uniform across anything that uses NSTextView, which is most things because it’s rich enough to build a full DTP system. Address book, calendar, password management, and so on are all services provided to all apps and so only the really stupid ones (looking at you, MS Office) roll their own, everything else that needs these integrated with them.
          • Every app exposes functionality to AppleScript for scripting. For my first book, I did a load of diagrams in OmniGraffle (which remains my favourite drawing program). I wrote a rule in my Makefile to invoke an AppleScript to do the .graffle to .pdf conversion and this was about four lines of AppleScript.
          • Text Services extend the functionality of other applications as plug-in transform services, for a long time, I used one that took the selected text, typeset it with pdflatex, and embedded the source as metadata for reversing the transform. Suddenly, every rich text view on the system could include typeset mathematics.
          • Beyond AppleScript, there are a bunch of other nice integrations with the command line. The pbpaste and pbcopy commands let you exchange typed data with the clipboard (so you can pipe things through ImageMagick, for example, and pop the transformed result back on the clipboard). The open command will open a file with the default app (or, optionally, an app you specify), open . will open the current directory in the Finder, if you want to do some things that are easier in the GUI. The terminal and GUI worlds are easy to move between.

          It’s not perfect and I could list a lot of things that could be improved in macOS, but GNOME always feels like someone has seen OS X through a telescope and tried to copy it. They’ve captured the look but little of the underlying behaviour that makes it a platform I actually enjoy using.

          [1] One of the first things I noticed when I started using Windows for the first time after a decade or so was that I couldn’t drag a slide from one PowerPoint presentation to another (this is how fixed), which worked perfectly on the Mac version of PowerPoint.

          1. 4

            I notice in general an awful lot of people focus way too much of the “g” and not enough on the “ui”. (This is also one of my criticisms of Wayland!) When I see a new gui library and they’re talking about “rendering” it is almost always a pass - it might look ok but it just doesn’t work right. Making it work right takes a lot of time…. and a coherent, focused vision to make it consistent, both things not easy to find among open source projects… or even commercial projects for that matter.

            I’ve not used too much Mac, but when I do, I really appreciate that menu search, it is legitimately great.

          2. 3

            I will acknowledge all those points but just say that Gnome is working towards that goal, even as a bunch of people give them shit for it. And the Wayland transition hasn’t helped things.

            Maybe some day Gnome will be closer, but it’s still something, and I vastly prefer it over KDE myself.

            1. 4

              I’d be more willing to believe GNOME were working towards those things if they weren’t among the things that were my objections to GNOME when we started Étoilé, twenty years ago.

              1. 2

                I’m not sure how I missed your connection to Étoilé up until now, but now that you’ve mentioned it I feel compelled to say that I’ve always found that project extremely interesting, but also difficult to engage. I know about it, but have unfortunately never actually gotten it running, because I used to follow GNUStep and its progress tracker religiously even back before Mac OS X was even in beta. (I still kick myself for not investing in Apple stock instead of a Sawtooth G4 + Mac OS X Server when StepWise was quoting me ~$8 a share, but that’s a different if adjacent story).

                Is the Étoilé dream still alive? Or has it dispersed into other projects that I should be looking into?

                1. 3

                  I cam the the conclusion that some of the things I wanted to do were not feasible with current hardware. For example, I wanted users to be able to share documents for collaborative editing and those documents be able to include code (in some end-user programming environment) that invoked native libraries. That requires a level of sandboxing that isn’t feasible with MMU-based isolation and a level of close communication that isn’t feasible with SFI (things like WebAssembly). So I worked on hardware that can do it. We’ll be shipping microcontrollers next year and hopefully we or others will get to application cores not long after.

                  Quentin and Eric made CoreObject usable on iOS and a few apps used it. I wish Apple had adopted it. A uniform set of model objects that you can use for distributed editing and unlimited persistent undo would have made iOS so much nicer.

                  1. 1

                    We’ll be shipping microcontrollers next year and hopefully we or others will get to application cores not long after.

                    Étoilé 2.0 laptops soon 🙏

                    Quentin and Eric made CoreObject usable on iOS and a few apps used it. I wish Apple had adopted it. A uniform set of model objects that you can use for distributed editing and unlimited persistent undo would have made iOS so much nicer.

                    There’s some kind of CloudKit integration with Core Data, but it’s definitely not as elegant sounding.

        2. 2

          See, I find the opposite, most non-technical, everyday users prefer Gnome over KDE, especially since MacOS and Smartphone UIs being more popular these days.

          My sample size is small, so idk, but like one of the people is also someone who bought an iphone, absolutely hated it and then bought another android. (I saw that iphone just sitting on his desk for two months straight and was like “do you wanna sell that thing?” and now it has been sitting on my desk barely used for almost a year…. but eh, it was my test hardware for something I blogged about earlier this month, so it isn’t completely unused! I actually think I might switch to it when my current phone dies, but that could be many years still.)

          KDE might be windows-like, but it is too busy, too configurable for your average user. People care about apps, and getting things done.

          Some people. Maybe even most, but the title of this article is “for All”. The people I set up kubuntu for aren’t technical at all but they like the generally familiar look and enjoy playing with the configuration options.

          (I said this in another comment thread recently on a different link, but I do think the config options are a support burden, since you can’t consistently be like “look at the lower left of the screen” or similar, but I’m not convinced they’re a problem for the users themselves - just hopefully there’s an “undo” button for when something disappears!)

          Again, my sample size is very small too, but tbh I haven’t seen any rigorous study so if comes down to my small personal experience vs the small personal experience of someone else on the internet, about the only thing I’ll say with confidence is that a project aiming to be “for all” has an uphill battle ahead of it.

          Even i as a power user, prefer Gnome over KDE. It gets out of my way and lets me get stuff done. I never fight it, it just works, and its pleasant to look at. It’s calm and not eye straining.

          I personally use neither; they’re both suboptimal at best.

          I hate how both sides rag on each other constantly for not using the same UI as them.

          I guess you agree “a desktop for all” is an uphill battle too :)

          1. 1

            Ok, I will give you the “for all” part. I don’t think there ever will be a for all. Not everyone likes windows, not everyone likes Mac, and they have no choice there.

            On Linux you get choice and because of that, there will never be one true way.

            But we can definitely make it easier to onboard between all the options.

            I think Distros need to find a way to have Gnome/KDE/other wm, to be chosen at user creation and stop having all three Spins and duplicate Distros just to change WMs.

            I know it’s not an easy problem to solve, but it should be something to strive for. Or Gnome/KDE need to go the Zorin route and support multiple layout presets.

            1. 2

              I think Distros need to find a way to have Gnome/KDE/other wm, to be chosen at user creation and stop having all three Spins and duplicate Distros just to change WMs.

              That’s how it has worked on Slackware Linux for as long as I have used it! They used to even include all in the default install so you can even pick the one you want when you login. Nowadays though, the default install no longer includes gnome - it has kde, xfce, and a number of the traditional smaller window managers.

              One of the reasons why they changed is just that it was like an extra gigabyte of download for things their typical user wouldn’t use; they’d pick one or the other and leave the rest just sitting there.

              But this is actually why I was able to easily pick neither lol, just a case of running the other already installed options.

        3. 2

          We really need to get away from this KDE vs Gnome war.

          Not that hard. I steer non-technical users to Cinnamon, which is a simple, stable environment modeled after ‘classic’ desktops with a taskbar with a menu button.

          It was developed by Linux Mint, generally regarded as one of the most user friendly distro’s and also one of the most popular. So I’d say it is a wellknown alternative.

          1. 1

            I loved cinnamon, but it stagnated for a long time. Now I need it to get Wayland support to be willing to suggest it.

      3. 5

        I’m a long time KDE fan (like late 1990s) and have constantly been surprised by how much traction GNOME seems to get from the major vendors despite being a bit more radical versus the UI paradigms casual users are more accustomed to. That’s not to say there is anything wrong with GNOME, but it feels a lot different than Windows and that is a potential barrier and they’ve committed to pretty significant departures from the GNOME 2 style UI.

        1. 5

          but it feels a lot different than Windows

          ~60% of the world is on mobile (according to statcounter), The amount of smart TVs that not-programming-forum people don’t mind using (or, can stand using) is increasing too. At some point, being Windows-like is not going to be a benefit compared to being app-ish / page-ish. (Even Microsoft knows its coming, hence Windows 8, though they jumped in a bit early).

          KDE and Gnome both have ecosystems that target Phone, Tablet, and Deskop use (with KDE also targeting TVs).

          1. 2

            Sure and closer to 100% of PC users would have a smartphone, tablet, or TV with some alternative UI. In my humble opinion, those UIs are great on their respective form factors but suck for the desktop where the Windows (or to split hairs you can call it original UNIX desktop) UI works fine. And yes I agree KDE scales fine to TV or convertibles, I have used it on both.

            KDE incorporated a lot of minor but nice UI improvements that seem conspicuously similar to features that eventually landed in Windows and I understand a lot of people have negative connotations around Windows in general so I do not think KDE is a Windows work-alike nor bound to make Windows mistakes or lethargy.. but it also doesn’t feel so different just for the sake of being different like GNOME3+.

      4. 1

        Yep. I don’t even prefer kde on my computers, but it’s the least surprising. I think gnome is intuitive enough to be learnable, but it’d be jarring like the windows 8 metro menu for the non-technical people in my life, I think.

        So I guess that means for better or worse, SteamOS is the immutable flatpak oriented Linux desktop for the masses and that’s weird!

      5. 1

        What’s a good distro that ships with a KDE desktop?

        1. 4

          Fedora has a KDE variant, for both the workstation and atomic versions (Kinoite).

        2. 1

          NixOS has Plasma 5 and 6.

      6. 1

        I using Gnome on my Laptop (for quite some time) and KDE on my Steam Deck. I personally like the look and feel of Gnome much more. KDE always feels a little bit too condensed and sometimes cluttered, some would say functional. But in the end both are great and a like to use both.

    4. 52

      I have a simple way of looking at it. If I, as a reader, am allowed to read a web page, then I am allowed to read it in whatever user agent I like. Most of the time, this would be with a browser on my computer. But there is nothing wrong with using something else. So if I want to use your service as a user agent, and read the page from within that app, why would that be wrong?

      Archiving is the same. I can save a page on my hard drive, or on some cloud storage, so why not on your cloud service?

      But this only works if you perform these actions directly on behalf of a user. If you move outside that, and, for instance, start using the archive for analytics, then it becomes questionable. IMHO.

      1. 7

        So if I want to use your service as a user agent, and read the page from within that app, why would that be wrong?

        Well, for one, my service would not reproduce the page 100% as the author intended. If the author had a script or an ad or “subscribe to my newsletter” form, it may not render correctly. Sure, it’s like using lynx or just a different/outdated/customized browser, but here me, the author of we Reader, has made that decision on behalf of the user, and I also benefit from the content because the user stays on my website. Moreover, the user might not even realize that they are missing some content.

        1. 48

          HTML is specifically designed not to allow “as author intended” to be meaningful. User agent serves the user

        2. 17

          Are you making money from removing their ads and replacing them with your own? Even inadvertently?

        3. 1

          I also benefit from the content because the user stays on my website.

          What benefit do you get from the user staying on your website? Doesn’t every page accessed by the user cost you money?

          1. 2

            I’m building a product and I want users to use it. So, the amount of users and the amount of time they spend on my service is a benefit for me. It’s the same kind of reasoning that Google tries to show its own widgets, “answers”, etc. instead of encouraging people to leave Google and visit someone else’s website.

            1. 4

              indeed, and I would be wary of using Google as the standard for ethics.

              how concretely do you expect to offset the costs associated with having users on your site? are users conscious of this flow of value or are they led to believe that the service is offered freely for their benefit alone?

              1. 2

                I plan to monetize the service. Reading, subscribing, and following users would be free; creating lists, commenting, exporting lists/favorites, and starting a new blog at Minifeed (it’s also a simple blogging platform, yeah) would be behind a cheap subscription.

          2. 2

            But they’re getting fame. Name recognition. “Personal brand” awareness. Sidelinks to projects they’d like to promote.

            What benefit does the user get? Not sure, they “just wanted the article”. Probably.

    5. 1

      The big trend seems to be trying to write SQL queries on the FE. It started with graphql and is now going into building a reactive sql db on the FE that syncs with the remote sql db.

      It’s definitely a cool idea, but the reactivity model is what makes the solution difficult and where something like redux – or any other reactive in-memory db – really shines.

      The complexity on the FE is not state management, rather side-effect management. All these libraries and frameworks are focusing on the wrong problem.

      It’s one big reason why I’ve been trying to develop a flexible side-effect system that leverages structured concurrency: https://starfx.bower.sh

      1. 11

        The big trend seems to be trying to write SQL queries on the FE

        Oh, this triggered a flashback!

        When I was young, very inexperienced and a PHP developer, the whole AJAX technique with XMLHTTPRequest was unknown yet. This was even before Google Maps popularized it, but it did exist, and I had found it somewhere on an obscure webpage that had left me flabbergasted. It updated the page without reloading. Sorcery!

        After finding out how it worked, my colleague and I immediately decided to apply it to our current project. Only to quickly discover that this meant a lot more work. We were duplicating functionality on the frontend and the backend. So the next logical step was…

        Write a MySQL client in JavaScript! Just write queries in the frontend and deal with the data there. It worked splendid!

        And only years later I realized that I created a massive security hole. Because there was no authentication at all. Basically, everyone on the web was able to do DROP DATABASE and it would have worked. Did I mention this was an app that handled medical data?

        As I said, I was very young. The company was very slow to refactor this because by the time this was pointed out, there was so much technical debt on top of that client that it would take too much to fix. According to management. But fortunately, soon after that there was a penetration test as part of some certification process and I tipped the pen tester where to look for problems. He probably would found it himself, but just to be sure. After that, the company was forced to fix it.

    6. 23

      Yeah, I quite dislike the “Choose Boring Technology” garbage I see. It’s all based on what I consider to be a field-wide straw man that reminds me quite a lot of how a decade ago people used to really shit on frontend devs even harder than they do today. Even in this lobsters thread there are terms like “Resume Driven Development” - something that I think is extremely uncommon, but the way it’s talked about you’d think it’s used every day by half of all developers.

      I hate “innovation tokens” as a term because I think they’re a terrible, muddying proxy. Operability, documentation, reliability, etc, are not complicated concepts (edit: they are complicated, but they’re the sort of things we should be comfortable talking about as professionals) and we should just talk about them like adults. “Newness” is a terrible proxy word, “Hype” is a garbage term and shouldn’t be taken seriously. A technology being “new” has some interesting correlated properties like “may correlate with being less tested” but you can just talk about whether it’s tested without talking about it being new. Using these nonsense terms that confuse and wrap up many of these properties is unhelpful in a serious field where you are presumably talking to people who have the training necessary to evaluate properties directly.

      Cardinality of technologies as a proxy for “complexity” is also reductive. Usually people just wrap up groups of technologies into ‘bubbles’ at the boundary where their understanding fails. The Linux kernel is so fucking massive in terms of its cardinality of features but no one talks about it that way, you don’t say “oh we should use the boring 2.6 kernel because 5.1 has so many new, hyped features” - largely because those new features are extremely useful and innovative. People “wrap up” everything in the Linux kernel into one “technology” because very few people understand or think about the kernel.

      This makes it difficult to ask questions like “should we use Redis or Cassandra” because you’re unable to talk about things like consensus, transactions, etc, and instead you’re left with “Cassandra feels more complex”. Any useful conversation will involve discussing how each technologies properties can be evaluated against your requirements and you’ll immediately throw away these silly terms if you want a productive conversation.

      I think this is why the term “complex” sucks as well, same with “simple”. What most people mean is that ag iven technology falls somewhere on an axis of “cardinality of properties” and “novelty of properties”. If you actually talk about cardinality and novelty you can have a really productive conversation - “most of the team hasn’t used this, it’s very novel. can we take the hit of a novelty?”, “there’s a lot of moving parts here, can we take the hit of owning each one?” but as soon as you use words like “simple”, “boring”, “complex”, “innovation tokens”, etc, you throw out engineering and appeal to very tricky, intuitive, divisive thought processes.

      I am quite sure that this article will largely be rejected because the “anti hype” movement in tech is so popular, but I’m entirely sympathetic to the majority of points made and I wish we’d just talk like engineers when we evaluate technologies. These terms are simply destructive to good thinking and good engineering.

      1. 18

        Even in this lobsters thread there are terms like “Resume Driven Development” - something that I think is extremely uncommon, but the way it’s talked about you’d think it’s used every day by half of all developers.

        I’m not sure I’ve worked anywhere that obvious examples of Resume Driven Develoment haven’t cropped up.

        I’ve definitely been guilty of it myself earlier in my career.

        1. 3

          Can you tell me more? I mean, for example, I chose Rust for my first project ever at my first job. But it was a throw away project, we ran the program a few times and then ditched it. I think that’s harmless and fosters learning and low-touch experimentation with new tech. And I didn’t do it for my resume, I did it for fun/ learning experience.

          I’ve seen things like “oh we’re using Go now, well that’s a good tech to learn for my career”.

          Someone just packing new technologies onto a resume is something I haven’t seen really.

          1. 8

            This kind of thing does happen in real life and it’s not really one person trying to pack a resume, but a sort of cargo cult that sweeps entire teams off their feet.

            A company I worked for rewrote an entire major product in Elixir when Elixir was about 1 or 2 years old. No one on the team knew Elixir or had ever worked with it. The original was written in Ruby and the company already had about 50 people on other teams writing Ruby as the entire engineering team was Ruby, so it was the “boring” choice.

            I advised against the rewrite while I was there and was able to halt this decision for a while. Me having helped write the original codebase I knew it had issues but at least we knew the issues. I told them if it was to be redone at least do it in the language everyone else knows.

            This was done purely out of “innovation” for the sake of it. People thought the codebase needed a rewrite so might as well do it on something new and exciting. They also thought something “fun” would help attract talent. I left the team shortly afterwards to go work on another project.

            Fast forward a few years later I come to find out the rewrite went totally bust because: they couldn’t find people to work on the project and expand the team (which I told them); couldn’t grow expertise on the language within the company (which I warned); couldn’t plug the gaps when people left the team.

            After all was said and done the project was rewritten yet again, in Ruby.

            I bet many devs left with a “I’ve written Elixir professionally for 2 years” on their resumes when the project really went south because of it. No one is the wiser.

            1. 5

              I think that people are eager to rewrite systems for a lot of reasons - a genuine belief it’s better (and they may be right), an inability to understand a system that predates their time, a desire to do more fun work, etc. I just haven’t seen pad my resume be one of them.

              That said, it’s not really central to my post. Whatever happened at that company, I’m simply in no position to make a judgment.

              1. 8

                “Pad my resume” sounds very deliberate. I think it is more about FOMO and trying to “stay relevant”.

          2. 4

            I’ve seen this first hand when a relatively entrenched enterprise decided that it’s a good idea to let product management have a parallel project to some other keystone product. The second system was from the ground up designed with all the tech that was vogue at the time of the conception — noSQL, more microservices than dev teams, golang, heavy frontend in angular. From the get go it was not supposed to ever replace the legacy keystone system, instead … just be there? Developed as an exercise in vouge in service to their own designers.

            Within 3 or 4 years it grew to roughly 1000 devs before finally someone realized it’s terrible business, and killed the second project. It delivered MVP so thin we’ve got laughed at by some of the fattest customers. Some few picked components were/are being retrofitted to the older system, but for reasons that amount to financial/legal/political, not technical.

            In hindsight it’s more about management failure, but one of the factions was motivated by, and hired to work on, new shiny things.

            1. 2

              Yeah so as I said in an adjacent post, “new shiny things” is not “resume driven” to me and I do think people are absolutely driven by “new shiny”.

      2. 3

        Operability, documentation, reliability, etc, are not complicated concepts and we should just talk about them like adults. … A technology being “new” has some interesting correlated properties like “may correlate with being less tested” but you can just talk about whether it’s tested without talking about it being new.

        I sympathize with this much, even if not the calling things “garbage” and “nonsense”.

        I think this is why the term “complex” sucks as well, same with “simple”. … as soon as you use words like “simple”, … you … appeal to very tricky, intuitive, divisive thought processes. … These terms are simply destructive to good thinking and good engineering.

        I suggest that concluding sentence could do without the “simply” that is not adding much but irony.

        1. 5

          I’m not saying the term “simply” can’t be used. I’m saying it’s destructive as a term when discussing technology.

          Specifically, when evaluating technology. I think there’s an interesting philosophical discussion about simplicity to be had, it’s just that it’s the sort of conversation I think is best over drinks where you can sort of have a fun little debate about things. At work, when acting as engineers, toss those terms out.

    7. 2

      With Thunderbird’s remit spanning email, calendar, contacts and tasks my biggest hope for the future is that we can have a MUA for Android that is able to create and handle iCalendar messages. Why is that exciting? Well it decouples calendar/tasks from your email service, just like on desktop. You could use third-party CalDAV or self-host… or even better you could use the DecSync/SyncThing ecosystem to have a calendar that syncs directly between your mobile/desktop/laptop without any internet server. To be practical you need that tight integration between the mail and calendar parts of the client, and most people will want it on their phone as well as their desktop.

      1. 2

        It makes me sad that every Android calendar tool I’ve found that supports non-Google calendar sources is always CalDAV, not just “export to/import from an iCalendar file in a place that SyncThing can see it”.

        1. 4

          Is this what you are looking for: https://icsx5.bitfire.at/ ? It is one-way though (read-only).

        2. 3
    8. 36

      Automated form filling for medical use sounds like a good way of killing people. LLMs do not translate from unstructured data to structured data, they create a token stream that has a high probability of conforming to the desired schema and a high probability of being a valid mapping from the input. These probabilities are determined by their training data. If you have very unusual symptoms, for example, there’s a higher chance that the LLM will determine that they are outside of the probability space allowed and replace them with something more common.

      The numbers for the film also seem to assume that you get the exact image that you want first time. So far, the short films made using diffusion models have required a lot of attempts at the prompts and a lot of post processing.

      The prices also ignore the fact that hosted models are currently aggressively subsidised. The cost of something like ChatGPT is an order of magnitude more than OpenAI charges, which is why they’re running at a multi billion dollar a year loss (hoping that they will lock everyone in and then be able to lower their costs). If the price per token went up by a factor of ten, some of these calculations change.

      1. 5

        It’s also a disincentive to fix the real problem – which is a process that involves forms that people can’t traverse in the first place

        I already feel like health care (in the US) is filled with “policy slop”, i.e. mainly insurance

        So throwing some LLM slop at the policy slop might work in some cases, but it’s a very dystopian feature

        It’s basically different bureaucracies fighting each other through obsfuscated policy, with patients caught in the middle

        1. 12

          I find that’s the case with a lot of LLM things. If an LLM can generate something correctly, it suggests that that thing is probably redundant. I’ve found that they’re quite good at identifying missing abstractions in frameworks.

          1. 2

            This is actually how I decide whether to use an LLM these days. If I am working and the thought ‘this is boring’ pops up, I immediately follow up with ‘maybe I can get an LLM to do it for me’.

      2. 2

        If you have very unusual symptoms, for example, there’s a higher chance that the LLM will determine that they are outside of the probability space allowed and replace them with something more common.

        Don’t humans and in particular doctors do precisely that? This may be anecdotal, but I know countless stories of people being misdiagnosed because doctors just assumed the cause to be the most common thing they diagnose. It is not obvious to me that LLMs exhibit this particular misjudgement more than humans. In fact, it is likely that LLMs know rare diseases and symptoms much better than any human.

        1. 4

          Yes, doctors misdiagnose things all the time, sometimes for that reason. But there’s a huge range in how often they do it, depending on how good they are. (Yes, I am an epidemiologist.) So it’s not at all obvious WHICH human standard we should be comparing machine-learning systems to.

          1. 1

            I love how two folks debate something I neither claimed in my blog post nor is the central point of my blog post.

        2. 2

          What do you mean by knowing?

          1. 1

            That the information is encoded in the weights of the LLM and thus appears in its replies. No human can know the entire medical literature, but you can just feed it into the LLM training data.

      3. 1

        Automated form filling for medical use sounds like a good way of killing people.

        Filling forms automatically for insurance payments and then getting a human to verify that kills people? Can you please elaborate

        1. 4

          What makes you think they’d bother with the human rather than substitute another AI product?

          1. 1

            What makes you think they’d bother with the human rather than substitute another AI product?

            They might. You guess is as good as mine.

            1. 3

              The flat fact of the matter is that the industrial revolution happened because machines were more reliable and cheaper than people. The computer revolution happened because computers were more reliable, faster, and (eventually) cheaper than people. The AI revolution looks like it’s going to fall on it’s face outside of a few very specific areas because it’s cheaper to pay people to do a job that is orders of magnitude more accurate than AI. The beast makes too many mistakes, digs too many holes, opens too many important people up for liability lawsuits. It costs too much to train an AI model to be even remotely passable at something, the flaws are inherent to the model, and requires too much actual human work making sure that the AI did it’s job right to be feasible.

      4. 1

        These probabilities are determined by their training data. If you have very unusual symptoms, for example, there’s a higher chance that the LLM will determine that they are outside of the probability space allowed and replace them with something more common.

        Have you seen examples of this happening?

        I understand the theory here, but I’ve been experimenting with structured data extraction using LLMs for over a year and I’ve not seen examples of this kind of problem myself.

        The intuition I’ve built up over time is that LLMs are really good at duplicating tokens from their input in their output stream when they need to.

        1. 4

          I guess it depends on the input. If you’re taking a structured form that’s already on a computer and generating some other structured output, you don’t need an LLM, you need half a dozen lines of Python. I was assuming that this was going from something like a hand written form or verbal dictation. In both of those cases, I have seen (recent) deep learning models replace ambiguity with common tokens, giving something that looks fine but is actually nowhere near the original.

          1. 1

            the use case would be “free text -> JSON” or other structured output, in cases where the input doesn’t necessarily belong to a known grammar (people put all sorts of stuff in spreadsheets). OCR/speech recognition just adds a level of ambiguity.

            Some relevant-looking benchmarks here with GPT-4o boasting a nice round 100% accuracy : https://github.com/stephenleo/llm-structured-output-benchmarks?tab=readme-ov-file . But again as OP mentions, left to be seen how this works on out-of-distribution data.

    9. 14

      NOYB does great work, but it is wrong on this one. Privacy Preserving Attribution is a sensible, well-designed feature and a giant step in the right direction for balancing privacy and the primary economic model of the Web, ads. I wrote more about why at https://alpaca.gold/@Jeremiah/113198664543831802

      1. 32

        I don’t know the specifics of NOYB’s complaint, but reading your thoughts, I think you’re missing an important point:

        Sending data over the web, at the very least, leaks your IP, possibly your device type as well. It doesn’t matter how anonymized the data contained in the envelope is. Making a web request that sends some data, any data, will always be leaky compared to making no web requests, which means that the user needs to trust the endpoint the browser is communicating with.

        And this is also where NOYB’s complaint may have merit because any service would work just fine without those PPA requests. And let’s be clear, PPA is relevant for third-party ads, less for first-party ones. In other words, user data is shared with third-parties, without the user expecting it as part of the service. Compared with browser cookies, a feature that enables many legitimate uses, PPA is meant only for tracking users. It will be difficult for Mozilla or the advertising industry to claim a legitimate interest here.

        Another point is that identifying users as a group is still a privacy violation. Maybe they account for that, maybe people can’t be identified as being part of some minority via this API. But PPA is still experimental, and the feature was pushed to unsuspecting users without notification. Google’s Chrome at least warned people about it when enabling features from Privacy Sandbox. Sure, they used confusing language, but people that care about privacy could make an informed decision.

        The fact that Safari already has this feature on doesn’t absolve Firefox. Apple has its issues right now with the EU’s DMA, and I can see Safari under scrutiny for PPA as well.


        Don’t get me wrong, I think PPA may be a good thing, but the way Mozilla pushed this experiment, without educating the public, is relatively disappointing.

        The reason for why I dislike Chrome is that it feels adversarial, meaning that I can’t trust its updates. Whenever they push a new update, I have to look out for new features and think about how I can get screwed by it. For example, at least when you log into your Google account, Chrome automatically starts sharing your browsing history with the purpose of improving search and according to the ToS, they can profile you as well, AFAIK.

        Trusting Firefox to not screw people over is what kept many of its users from leaping to Chrome, and I had hoped they understood this.

        The least they could do is a notification linking to some educational material, instead of surprising people with a scary-looking opt-out checkbox (that may even be problematic under GDPR).

        1. 4

          Sending data over the web, at the very least, leaks your IP, possibly your device type as well. It doesn’t matter how anonymized the data contained in the envelope is. Making a web request that sends some data, any data, will always be leaky compared to making no web requests, which means that the user needs to trust the endpoint the browser is communicating with.

          The problem with this is that it claims too much. You’re effectively declaring that every web site in existence is in violation of GDPR, because they all need to know your IP address in order to send packets back to you, which makes them recipients and processors of your personal data.

          This sort of caricature of GDPR is one reason why basically every site in Europe now has those annoying cookie-consent banners – many of them are almost certainly not legally required, but a generic and wrong belief about all cookies being inherently illegal under GDPR without opt-in, and a desire on the part of industry for malicious compliance, means they’re so ubiquitous now that people build browser extensions to try to automatically hide them or click them away!

          1. 22

            Sorry to say this, but this is nonsense.

            The GDPR acknowledges that the IP is sent alongside requests, and that it may be logged for security purposes. That’s a legitimate interest. What needs consent is third-party tracking with the purpose of monetizing ads. How you use that data matters, as you require a legal basis for it.

            Cookies don’t need notifications if they are needed for providing the service that the user expects (e.g., logins). And consent is not needed for using data in ways that the user expects as part of the service (e.g., delivering pizza to a home address).

            The reason most online services have scary cookie banners in the EU is because they do spyware shit.

            Case in point, when you first open Microsoft Edge, the browser, they inform you that they’re going to share your data with over 700 of Microsoft’s partners, also claiming legitimate interest for things like “correlating your devices” for the purpose of serving ads, which you can’t reject, and which is clearly illegal. So Microsoft is informing Edge users, in the EU, that they will share their data with the entire advertising industry.

            Well, I, for one, would like to be informed of spyware, thanks.

            1. 8

              The GDPR acknowledges that the IP is sent alongside requests, and that it may be logged for security purposes. That’s a legitimate interest. What needs consent is third-party tracking with the purpose of monetizing ads.

              Luckily for Mozilla, PPA does not do “third-party tracking with the purpose of monetizing ads”. In fact, kind of the whole point of PPA is that it provides the advertiser with a report that does not include information sufficient to identify any individual or build a tracking profile of an individual. The advertiser gets aggregate reports that tell them things like how many people saw or clicked on an ad but without any sort of identification of who those people were.

              This is why the fact that, yes, technically Mozilla does receive your IP address as part of a web request does not automatically imply that Mozilla is doing processing of personal data which would trigger GDPR. If Mozilla does not use the IP address to track you or share it to other entities, then GDPR should not have any reason to complain about Mozilla receiving it as part of the connection made to their servers.

              As I’ve told other people: if you want to be angry, be angry. But be angry at the thing this actually is, rather than at a made-up lie about it.

              The reason most online services have scary cookie banners in the EU is because they do spyware shit.

              No, they do it because (like the other reply points out), they have a compliance department who tells them to do it even if they don’t need to, because it’s better to do it.

              There’s a parallel here to Proposition 65 in the US state of California: if you’ve ever seen one of those warning labels about something containing “chemicals known to the State of California to cause cancer”, that’s a Proposition 65 warning. The idea behind it was to require manufacturers to accurately label products that contain potentially hazardous substances. But the implementation was set up so that:

              • If your product is eventually found to cause cancer, and you didn’t have a warning, you suffer a huge penalty, but
              • If your product does not cause cancer, and you put a warning on it anyway, you suffer no penalty.

              So everyone just puts the warning on everything. Even things that have almost no chance of causing cancer, because there’s no penalty for a false cancer warning and if your product ever is found to cause cancer, the fact that you had the warning on it protects you.

              Cookie banners are the same way: if you do certain things with data and don’t get up-front opt-in consent, you get a penalty. But if you get the consent and then don’t do anything which required it, you get no penalty. So the only safe thing to do is put the cookie consent popup on everything all the time. This is actually an even more important thing in the EU, because (as Europeans never tire of telling everyone else) EU law does not work on precedent. 1000 courts might find that your use of data does not require consent, but the 1001st court might say “I do not have to respect the precedents and interpretations of anyone else, I find you are in violation” and ruin you with penalties.

              1. 10

                This is why the fact that, yes, technically Mozilla does receive your IP address as part of a web request does not automatically imply that Mozilla is doing processing of personal data which would trigger GDPR.

                Mozilla does not have a legitimate interest in receiving such reports from me.

              2. 9

                tell them things like how many people saw or clicked on an ad

                They can look at their web server logs?

                1. 3

                  Those are fairly useless for this purpose without a lot of cleaning up and even then I’d say it is impossible to distinguish bots from real visits without actually doing the kind of snooping everyone is against.

                  1. 7
                    • User clicks ad, browser opens company.com/showad/id112233.
                    • Company associates session with source of id112233.
                    • User buys thing (or does something else Company is optimizing for).
                    • Company attributes “conversion” to ad.

                    This requires no third party?

                    1. 1

                      You are not allowed to associate a session until you have permission for it and you don’t on first page load if visitor didn’t agree to it on a previous visit.

                      This whole described tracking through website is illegal if you either don’t have a previous agreement or you don’t need session for the pages to even work which you will have a hard time arguing for browsing a web shop.

                      Using third party doesn’t solve anything because you need permission to do this kind of tracking anyway. My argument however was that you can’t learn how many people saw or clicked an ad from your logs because some who saw it on other peoples pages or search engine of which you don’t have logs and A LOT of those clicks are fake and your logs are unlikely rich enough to know which.

                      What you want to learn about people’s behavior is more than above which I’m sure you’d know if this was actually remotely your job.

                      1. 8

                        What you want to learn about people’s behavior is more than above which I’m sure you’d know if this was actually remotely your job.

                        “What you want to learn about people’s behavior” is one thing, “what you should be able to learn about people’s behavior” is something else.

                        IMHO, it’s not the job of those neck-deep in the industry to set the rules of what’s allowed and not.

                        1. 2

                          I’m not sure anyone here is arguing that these are the same thing and certainly not me.

                          I’m not sure if you are implying that I am neck-deep in the ad industry, but I certainly never have been. I am, however, responsible also for user experience in our company and there’s a significant overlap in needing to understand visitor/user behavior.

                          We go to great lengths to not only comply with the letter of the law, but also with its spirit which means we have to make a lot of decisions less informed as we’d prefer. I am not complaining about that either, but it does bother me describing every attempt to ethically learn as either not necessary or sinister.

              3. 2

                If your product is eventually found to cause cancer, …

                The condition for requiring a warning label is not “causes cancer” but “exposes users to something that’s on this list of ‘over 900 chemicals’ at levels above the ‘safe harbor levels’”, which is a narrower condition, although maybe not very narrower in practice. (I also thought that putting unnecessary Prop. 65 warning labels on products had also been forbidden (although remaining common), but I don’t see that anywhere in the actual law now.)

            2. 3

              No, the reason many have them is that every data privacy consultant will beat you over your head if you don’t have an annoying version of it. Speaking as someone on the receiving end of such reports.

              1. 12

                No, you must have an annoying version of it because the theory goes, the more annoying it is the higher the chance the users will frustratingly click the first button they see, e.g. the “accept all” button. The job of privacy consultants is to legitimize such practices.

                1. 2

                  Which part of “Speaking as someone on the receiving end of such report” was not clear?

                  Do you think they are trying to persuade us to have more annoying versions so we could collect more information even though we don’t want to for benefit of whom exactly?

                  My guess is that you don’t have much experience with working with them and how those reports actually look like.

                  1. 7

                    Well, what I do know is that the average consent modal you see on the internet is pretty clearly violating the law, which means that either the average company ignores their data privacy consultants, or the data privacy consultants that they hire are giving advice designed to push the limits of the law.

          2. 7

            The problem with this is that it claims too much. You’re effectively declaring that every web site in existence is in violation of GDPR, because they all need to know your IP address in order to send packets back to you, which makes them recipients and processors of your personal data.

            Yes, IP addresses are personal data and controlled under GDPR, that’s correct. That means each and every HTTP request made needs freely given consent or legitimate interest.

            I request a website, the webserver uses my IP address to send me a reply? That’s legitimate interest. The JS on that site uses AJAX to request more information from the same server? Still legitimate interest.

            The webserver logs my IP address and the admin posts it on facebook because he thinks 69.0.4.20 is funny? That’s not allowed. The website uses AJAX to make a request to an ad network? That isn’t allowed either.

            I type “lobste.rs” into Firefox, and Firefox makes a request to lobsters? Legitimate interest. Firefox makes an additional request to evil-ad-tracking.biz to tell them that I visited lobsters? That’s not allowed.

      2. 26

        a giant step in the right direction for balancing privacy and the primary economic model of the Web, ads

        Balancing lol, for years ad providers ignore all data protections laws (in Germany, way before GDPR) and GDPR. They are staking all users without consent. Then the EU forces the ad companies to follow the law and at least ask the user if they want to share private data. The ad companies successful framed this as bad EU legislation. And now your browser wants to help add companies to staking you. Framing this as balancing is ridiculous.

        All it does is tell a site you have already visited that someone got to the site via an ad without revealing PII. […] which ads worked without knowing who they worked for specifically

        Just because there is no nametag on it doesn’t mean it’s not private data.

        It’s reasonable for a business to know if their ad worked

        Sorry for the bad comparison: But it’s also reasonable for a thief to want to break in your house. But it’s illegal. Processing personal data is illegal, with some exceptions. Yes there is a “the legitimate interests”, but this has to be balances with “fundamental rights and freedoms of the data subject”. I would say “I like money” isn’t enough to fall under this exception.

        Apple enabled Privacy Preserving Attribution by default for iOS and Safari on macOS 3 years ago

        ``But the other one is also bad’’. This could be an argument, iff you can prove that this is willful ignored by others. There is so much vendors pushing such shit to there paying customers, so I would assume this was overseen. Also Apple should disable it also, because as far as I see it’s against the law (no I’m not a lawyer).

        And no I don’t say ads are bad or you shouldn’t be allowed to do some sort of customers analyses. But as the freedom of your fist ends where my nose starts. The freedom of the market analyses ends when you stalking customers. I know it’s not easy to define where customers analyses end and where stalking starts, but currently ad companies are miles away for it. So stop framing this poor little advertisers.

        1. 6

          Processing personal data is illegal

          The thing that makes me and presumably some other people sigh and roll our eyes at responses like this is that we’re talking about a feature which is literally designed around not sending personal data to advertisers for processing! The whole point of PPA is to give an advertiser information about ad views/clicks without giving them the ability to track or build profiles of individuals who viewed or clicked, and it does this by not sending the advertiser information about you. All the advertiser gets is an aggregate report telling them things like how many people clicked on the ad.

          If you still want to be angry about this feature, by all means be angry. Just be angry about the actual truth of it rather than whatever you seem to currently believe about it.

          1. 10

            The only problem I see is that Mozilla is able to track and build profiles of individuals. To some extent, they’ve always been able to do so, but they’ve also historically been a nonprofit with a good track record on privacy. Now we see two things in quick succession: first, they acquire an ad company, and historically, when a tech company acquires an ad company, it’s being reverse-acquired. Second, they implement a feature for anonymizing and aggregating the exact kind of information that advertising companies want (which they must, in the first place, now collect). PPA clearly doesn’t send this information directly to advertisers. But do we now trust Mozilla not to sell it to them separately? Or to use it for the benefit of their internal ad company?

            1. 3

              The only problem I see is that Mozilla is able to track and build profiles of individuals.

              Except they aren’t! They’ve literally thought of this and many other problems, and built the whole thing around distributed privacy-preserving aggregation protocols and random injected noise and other techniques to ensure that even Mozilla does not have sufficient information to build a tracking profile on an individual.

              And none of this is secret hidden information. None of it is hard to find. That link? I type “privacy preserving attribution” into my search engine, clicked the Mozilla support page that came up, and read it. This is not buried in a disused lavatory with a sign saying “Beware of the Leopard”. There’s also a more technical explainer linked from that support doc.

              Which is why I feel sometimes like I should be tearing my hair out reading these discussions, and why I keep saying that if someone wants to be angry I just want them to be angry at what this actually is, rather than angry at a pile of falsehoods.

              1. 3

                Except they aren’t! They’ve literally thought of this and many other problems, and built the whole thing around distributed privacy-preserving aggregation protocols and random injected noise and other techniques to ensure that even Mozilla does not have sufficient information to build a tracking profile on an individual.

                How do I actually know that Mozilla’s servers are implementing the protocol honestly?

                1. 2

                  How do I actually know that Mozilla’s servers are implementing the protocol honestly?

                  How do you know anything?

                  Look, I’ve got a degree in philosophy and if you really want me to go deep on whether you can know things and how, I will, but this is not a productive line of argumentation because there’s no answer that will satisfy. Here’s why:

                  Suppose that there is some sort of verifier which proves that a server is running the code it claims to be; now you can just reply “ah-ha, but how do I trust that the verifier hasn’t been corrupted by the evil people”, and then you ask how you can know that the verifier for the verifier hasn’t been corrupted, and then the verifier for the verifier for the verifier, and thus we encounter was is known, in philosophy, as the infinite regress – we can simply repeat the same question over and over at deeper and deeper levels, so setting up the hundred-million-billion-trillionth verifier-verifier just prompts a question about how you can trust that and now we need the hundred-million-billion-trillion-and-first verifier verifier, and on and on we keep going.

                  1. 5

                    How do you know anything?

                    This is an excellent question, and frankly the basis of my opposition to any kind of telemetry bullshit no matter how benign it might seem to you now. I absolutely don’t know that it’s safe or unsafe, or anonymous or only thought to be anonymous. It turns out you basically can’t type on a keyboard without anybody being able to turn a surprisingly shitty audio recording of your keyboard into a pretty accurate transcript of what you typed. There have been so many papers that have demonstrated that a list of the fonts visible to your browser can often uniquely identify a person. Medical datasets have been de-anonymised just by using different bucketing strategies.

                    I have zero confidence that this won’t eventually turn out to be similar, so there is zero reason to do it at all. Just cut it out.

                    1. 1

                      If there’s no amount of evidence someone could present to convince you of something, you can just say so and let everyone move on. I don’t like arguing with people who act as if there might be evidence that would convince them when there isn’t.

                      1. 7

                        It’s a perfectly legitimate position to hold that the only valid amount of leaked information is zero. You’re framing it as if that was something unreasonable, but it’s not. Not every disagreement can be solved with a compromise.

                      2. 4

                        I prefer to minimize unnecessary exposure. If I visit a website, then, necessarily, they at a minimum get my IP address. I don’t like it when someone who didn’t need to get data from me, gets data from me. Maybe they’re nice, maybe they’re not nice, but I’d like to take fewer chances.

          2. 9

            trust lost is hard regained. The ad industry is obviously in a hard place here.

            1. 9

              The thing that really leaves a bitter taste in my mouth is that it feels like “the ad industry” includes Mozilla now.

              1. 3

                shouldn’t be surprising; it’s been their #1 funding source for years…

      3. 20

        I like your take on this, insomuch as “it’s better than what we currently have”.

        It’s reasonable for a business to know if their ad worked.

        I don’t agree with this, it wasn’t even possible to know until about 20 years ago. The old ad-man adage goes that “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” Well that’s just the price you pay when producing material that hardly ever is a benefit to society.

        Funnily enough there does seem to have been a swing back towards brands and publishers just cutting all middle men out and partnering up. This suggests to me that online ads aren’t working that well.

        1. 4

          This to me is so incredibly naive and I’m speaking as someone who doesn’t like ads. How in the world would anyone hear about your product and services without them, especially if they are novel?

          Imagining that every company, no matter how small or new sits on tons of money they can waste on stuff that is ineffective seems unreasonable. Having ads be an option only for companies already successful enough doesn’t seem particularly desirable from point of view of economy.

          I’m as much against snooping, profiling and other abuses as the next guy, but I disagree with seeing every tracking, no matter how much it is privacy preserving, as inherently bad.

          1. 13

            If your company can’t survive without ad tech, it should just cease to exist.

            1. 4

              Why? Justify that. What is it about a company requiring advertising that inherently reduces the value of that company to 0 or less? If I have a new product and I have to tell people about it to reach the economic tipping point of viability, my product is worthless? Honestly, I find this notion totally ridiculous - I see no reason to connect these things.

              1. 9

                I never said anything about advertizing, I said ad tech. Go ahead and advertize using methods that don’t violate my privacy or track me in any way.

                1. 2

                  Now you’re conflating “ad tech” with tracking. And then what about tracking that doesn’t identify you?

                  1. 3

                    What do you think the ad tech industry is? And I simply do not consent to being tracked.

                    1. 1

                      So if an ad didn’t track you you’d be fine with it? If an ad tech company preserved your privacy, you’d be fine?

                      1. 5

                        I am fine with ads that are not targeted at me at all, and don’t transmit any information about me to anyone. For example, if you pay some website to display your ad to all its visitors, that it fine to me. Same as when you pay for a spot in a newspaper, or billboard. I don’t like it, but I’m fine with it.

          2. 9

            It’s absolutely naive, and I stand by it because I don’t care if you can’t afford to advertise your product or service. But I do find ads tiresome, especially on the internet. Maybe I’m an old coot but I tend to just buy local and through word of mouth anyway, and am inherently put off by anything I see in an ad.

            Imagining that every company, no matter how small or new sits on tons of money they can waste on stuff that is ineffective seems unreasonable. Having ads be an option only for companies already successful enough doesn’t seem particularly desirable from point of view of economy.

            This is pretty much the state of affairs anyway. Running an ad campaign is a money-hole even in the modern age. If I turn adblock off I just get ads for established players in the game. If I want anything novel I have to seek it out myself.

            But as I said, I’m not against this feature per-se, as an improvement on the current system.

            1. 12

              It’s worth repeating, society has no intrinsic responsibility to support business as an aggregated constituent, nor as individual businesses.

              One might reasonably argue it’s in everyone’s best interest to do so at certain times, but something else entirely to defend sacrosanct business rights reflexively the moment individual humans try to defend themselves from the nasty side effects of business behavior.

              1. 2

                We absolutely have a responsibility to do so in a society where people rely on businesses for like… everything. You’re typing on a computer - who produced that? A business. How do you think most Americans retire? A business. How do new products make it onto the market? Advertising.

                I think it’s exactly the opposite situation of what you’re purporting. If you want to paint the “society without successful businesses is fine” picture, you have to do so.

                1. 5

                  Would it not be fair to suggest that there’s a bit of a gulf between businesses people rely on and businesses that rely on advertising? Perhaps it’s just my own bubble, dunno

                2. 0

                  I am obligated to read a history book for you?

          3. 9

            Advertising predates the internet, and still exists robustly outside web & mobile ad banners.

            But even if it didn’t, word of mouth & culture can still inform about products & services.

          4. 7

            How in the world would anyone hear about your product and services without them, especially if they are novel?

            Have you heard of shops? It’s either a physical or virtual place where people with money go to purchase goods they need. And sometimes to browse if there’s anything new and interesting that might be useful.

            Also, have you heard of magazines? Some of them are dedicated to talking about new and interesting product developments. There are multiple printed (and digital) magazines detailing new software releases and online services that people might find handy.

            Do they sometimes suggest products that are not best for the consumer, but rather best for their bottom line? Possibly. But still, they only suggest new products to consumers who ask for it.

      4. 14

        Regardless how well PPA works, I think this is crux of the issue:

        Mozilla has just bought into the narrative that the advertising industry has a right to track users

        Even if PPA is technically perfect in every way, maybe MY personal privacy is preserved. But ad companies need to stop trying to insert themselves into every crack of society. They still have no right to any kind of visibility into consumer traffic, interests, eyeballs, whatever.

        1. 4

          PPA does not track users. It tracks that an ad was viewed or clicked and it tracks if an action happened as a result, but the user themself is never tracked in any way. This is an important nuance.

          1. 10

            Assuming that’s true (and who can know for sure when your adversary is a well-funded shower of bastards ad company), what I say still stands:

            They still have no right to any kind of visibility into consumer traffic, interests, eyeballs, whatever.

            1. 3

              What “visibility into consumer traffic, interests, eyeballs, whatever” do you think PPA provides?

              The crux of PPA is literally that an advertiser who runs ads gets an aggregate report with numbers that are not the actual conversion rate (number of times someone who saw an ad later went on to buy the product), but is statistically similar enough to the actual conversion rate to let the advertiser know whether they are gaining business from running the ad.

              It does not tell them who saw an ad. It does not give them an identifier for the person who saw the ad. It does not tell them what other sites the person visited. It does not tell them what that person is interested in. It does not give them a behavioral profile of that person. It does not give them any identifiable information at all about any person.

              For years, people have insisted that they don’t have a problem with advertising in general, they have a problem with all the invasive tracking and profiling that had become a mainstay of online advertising. For better or worse, Mozilla is taking a swing at eliminating the tracking and profiling, and it’s kind of telling that we’re finding out how many people were not being truthful when they said the tracking was what they objected to.

              Personally, while I don’t like seeing ads, and on services that I use enough that offer me the option, I pay them money in exchange for not seeing ads, I also understand that being online costs money and that I don’t want the internet to become a place only for those wealthy enough to afford it without support. So having parts of the web that are paid for by mechanisms like advertising – provided it can be done without the invasive tracking – rather than by the end user’s wallet is a thing that probably needs to exist in order to enable the vibrant and diverse web I want to be part of, and lurking behind all the sanctimoniousness and righteous sneers is, inevitably, the question of how much poorer the web would be if only those people who can pay out of their own pockets are allowed onto and into it.

              1. 5

                I’m saying they don’t have the right to “know whether they are gaining business from running the ad.”

                It’s not necessarily bad for them to know this, but they are also not entitled to know this. On the contrary: The user is entitled to decide whether they want to participate in helping the advertiser.

                1. 1

                  Well, in order to even get to the point of generating aggregate reporting data someone has to both see an ad and either click through it or otherwise go to the site and buy something. So the user has already decided to have some sort of relationship with the business. If you are someone who never sees an ad and never clicks an ad and never buys anything from anyone who’s advertised to you, you don’t have anything to worry about.

                  1. 5

                    none of that contradicts the fact that advertisers are not entitled to additional information with the help of the browser.

              2. 3

                It does not tell them who saw an ad. It does not give them an identifier for the person who saw the ad. It does not tell them what other sites the person visited. It does not tell them what that person is interested in. It does not give them a behavioral profile of that person. It does not give them any identifiable information at all about any person.

                Question: how is the ad to be displayed selected? With the introduction of PPA, do advertizers plan on not using profiling to select ads anymore? Because that part of the ad tech equation is just as important as measuring conversions.

                1. 2

                  Fun fact: Mozilla had a proposal a few years back for how to do ad selection in a privacy-preserving way, by having the browser download bundles of ads with metadata about them and do the selection and display entirely on the client side.

                  People hated that too.

              3. 3

                Personally, while I don’t like seeing ads, and on services that I use enough that offer me the option, I pay them money in exchange for not seeing ads, I also understand that being online costs money and that I don’t want the internet to become a place only for those wealthy enough to afford it without support. So having parts of the web that are paid for by mechanisms like advertising – provided it can be done without the invasive tracking – rather than by the end user’s wallet is a thing that probably needs to exist in order to enable the vibrant and diverse web I want to be part of, and lurking behind all the sanctimoniousness and righteous sneers is, inevitably, the question of how much poorer the web would be if only those people who can pay out of their own pockets are allowed onto and into it.

                The Internet is already a place only for those wealthy enough to pay out of their own pockets for a computer and Internet connection that is fast enough to participate. Without ads, many sites would have to change their business model and may die. But places like Wikipedia and Lobsters would still exist. Do you really think the web would be poorer if websites were less like Facebook and Twitter and more like Wikipedia and Lobsters?

                1. 1

                  Someone who doesn’t own a computer or a phone can access the internet in many public libraries – free access to browse should be more plentiful but at least exists.

                  But web sites generally cannot be had for free without advertising involved, because there is no publicly-funded utility providing them.

                  1. 1

                    So you want to preserve ads so that people who rely on public libraries for Internet access can offset hosting costs by putting ads on their personal websites? That still requires some money to set up the site in the first place, and it requires significant traffic to offset even the small hosting cost of a personal website.

                    Clearly you have something else in mind but I can’t picture it. Most people don’t have the skills to set up their own website anyway, so they use services such as Facebook or Wikipedia to participate on the Internet. Can you clarify your position?

                  2. 0

                    I thought this discussion was getting really interesting so I’m assuming it fell by the wayside and that you would appreciate me reviving it. did want to respond? or would you rather I stop asking

          2. 1

            but who views or clicks on the ad? it would have to be a user.

      5. 14

        Privacy Preserving Attribution is a sensible, well-designed feature and a giant step in the right direction for balancing privacy and the primary economic model of the Web, ads.

        There is a very simple question you can ask to discover whether a feature like this is reasonable: if the user had to opt in for it, how many users would do so if asked politely?

        This is innovation in the wrong direction. The actual problem is that everyone beliefs that ads are the primary/only economical model of the Web and that there is nothing we can do about it. Fixing that is the innovation we actually need.

        1. 4

          We could have non-spyware ads that don’t load down browsers with megabytes of javascript, but no-one believes that it is possible to advertise ethically. Maybe if web sites didn’t have 420 partners collecting personal data there would be fewer rent-seeking middlemen and more ad money would go to the web sites.

          1. 3

            Ads. We all know them, we all hate them. They slow down your browser with countless tracking scripts.
            Want in on a little secret? It doesn’t have to be this way. In fact, the most effective ads don’t actually have any tracking! More about that, right after this message from our sponsor:

            (trying to imitate the style of LTT videos here)

            We’ve got non-spyware ads that don’t contain any interactivity or JS. They’re all over video content, often called “sponsorships”. Negotiated directly between creators and brands, integrated into the video itself without any interactivity or tracking, most of the time clearly marked. And they’re a win-win-win. The creator earns more, the brand actually gets higher conversion and more control about the context of their ad, and by nature the ads can’t track the consumer either.

          2. 2

            Maybe if I could give half a cent per page view to a site, they’d make a lot more than they ever made from ads.

      6. 13

        Sure, but IMHO this is still not a reason to turn it on by default.

      7. 23

        The browser colluding with advertisers to spy on me is, in fact, not sensible.

        1. 6

          Please be clear about what “spying” you think is being performed.

          For example: list all personally-identifying information you believe is being transmitted to the advertiser by this feature of the browser.

          1. 5

            You can read documentation about the feature yourself.

          2. 3

            (Note that I’m not the parent poster, I’m just replying here because the question of what data is actually being tracked seems like the crux of the matter, not because I want to take out the pitchforks.)

            Reading through the data here, it seems to me like the browser is tracking what ads a user sees. Unfortunately the wording there is kind of ambiguous (e.g. what’s an “ad placement”? Is it a specific ad, or a set of ads?) but if I got this right, the browser locally tracks what ad was clicked/viewed and where, with parameters that describe what counts as a view or a click supplied by the advertiser. And that it can do so based on the website’s requirements, i.e. based on whatever that website considers to be an impression.

            Now I get that this report isn’t transmitted verbatim to the company whose products are being advertised, but:

            1. Can whoever gets the reports read them and do the tracking for the third-party website?
            2. If the browser maintains a list of ads (for impression attribution), can it be tracked based on the history of what ads it’s seen? Or as a variant: say I deliver a stream of bogus but unique (as in content, or impression parameters for view/click) ads, so each ad will get an impression just once. Along with that, I deliver the “real” ads, for shoes, hats or whatever. Can I now get a list of (unique bogus ads, real ad) pairs?
            1. 3

              I realise this is a hot topic for you, but if you’re bringing up the principle of charity, can we maybe try it here, too? :-) That’s why I prefaced this with a “I’m not the parent poster” note.

              That technical explainer is actually the document that I read, and on which my questions are based. I’ve literally linked to it in the comment you’re responding to. I’m guessing it’s an internal document of sorts because that’s not “very redable” to someone who doesn’t work in the ad industry at all. It also doesn’t follow almost any convention for spec documents, so it’s not even clear if this is what’s actually implemented or just an early draft, if the values “suggested” there are actually being used, which features are compulsory, or if this the “final” version of the protocol.

              My first question straight out comes from this mention in that document:

              Our DAP deployment [which processes conversion reports] is jointly run by Mozilla and ISRG. Privacy is lost if the two organizations collude to reveal individual values.

              (Emphasis mine).

              Charitably, I’m guessing that the support page is glossing over some details in its claim, given that there’s literally a document describing what information about one’s browsing activities is being sent and where. And that either I’m misunderstanding the scope of the DAP processing (is this not used to process information about conversions?) or that you’re glossing over technical details when you’re saying “no”. If it’s the latter, though, this is lobste.rs, I’d appreciate if you didn’t – I’m sure Mozilla’s PR team will be only too happy to gloss over the details for me in their comments section, I was asking you because a) you obviously know more about this than I do and b) you’re not defaulting to “oh, yeah, it’s evil”.

              I have no idea what running a DAP deployment entails (which is why I’m asking about it) so I don’t really know the practical details of “the two organizations collude” which, in turn, means I don’t know how practical a concern that is. Which is why I’m asking about it. Where, on the spectrum between “theoretically doable but trivially detected by a third party” and “trivially done by two people and the only way to find out is to ask the actual people who did it”, is it placed?

              My second question is also based on that document. I don’t work in the ad industry and I’m not a browser engineer, so much of the language there is completely opaque. Consequently:

              1. I’m obviously aware that only conversions are reported, since that’s the only kind of report described there. But:
              • The document also says that “a site can register ad impressions [which they do] by generating and dispatching a CustomEvent as follows”. Like I said above: not in the ad industry, not a browser engineer, I have no idea what a CustomEvent is. In its simplest form, reading the doc it sounds like the website is the one generating events. But if that’s the case, they can already count impressions, they don’t even need to query the local impression database. (The harder variant is that the event is fired locally and you can’t hook to it it any way, but it’s still based on website-set parameters – see my note in 5. below for that). I imagine I’m missing something, but what?
              • The document doesn’t explain what impression data is available to websites outside the report. All it says is “tthe target site cannot query this database directly” which can mean anything between “the JS environment doesn’t even know it’s there” to “you can’t read it directly but there’s an API that exposes limited information about it”.
              1. The document literally lists “richer impression selection logic” and “ability to distribute that value to multiple impressions” as desirable traits that weren’t in the spec purely due to prototyping concerns, so I’ve certainly treated the “one ad at a time” limitation as temporary. And, in any case, I don’t know if that’s what’s actually being implemented here.
              2. The advertiser budget is obviously tunable, the document only suggests two, doesn’t have an upper cap on the limit, and doesn’t have a cap on how often it can be refreshed, either (it only suggests weekly). It also doesn’t explain who assigns these limits.
              3. was actually the subject of my first question and isn’t directly relevant here, although 5 is
              4. I obviously didn’t miss the part about differential privacy. My whole second question is about whether the target site can use this mechanism as a side-channel to derive user tracking information, not whether they can track users based on the impression report themselves, which they obviously can’t, like, that’s the whole point.
    10. 11

      regarding PPA, if I have DNT on, what questions are there still unclear?

      regarding the primary economic model, that’s indeed the problem to be solved. Once print had ads without tracking and thrived. An acceptable path is IMO payments, not monetised surveillance. Maybe similar https://en.wikipedia.org/wiki/VG_Wort

      and regarding opt-in/out: one doesn’t earn trust by going the convenient way. Smells.

      1. 22

        Once print had ads without tracking and thrived.

        Once Google had ads without tracking and thrived, enough to buy their main competitor Doubleclick. Sadly, Doubleclick’s user-surveillance-based direct-marketing business model replaced Google’s web-page-contents-based broadcast-advertising business model. Now no-one can even imagine that advertising might possibly exist without invasive tracking, despite the fact that it used to be normal.

        1. 7

          It’s funny because not once in my entire life have I ever seen an invasive tracking ad that was useful or relevant to me. What a scam! I have clicked on two ads in my entire life, which were relevant to me, and they were of the kind where the ad is based on the contents of the site you’re visiting.

          1. 3

            great illustration of how the impact of ads is disparately allocated. some people click on ads all the time and it drains their bank account forcing them into further subordination to employers. this obviously correlates with lower education and economic status.

    11. 9

      why should the “primary economic model of the Web” be given any weight whatsoever against user control and consent?

  1. 3

    I love modal editing, and it is nice to see people write about it. But often they make it seem like the vi/vim/neovim style of editing language is the only one. I tried using neovim as my main editor for a couple of years, but I gave up on it because even after all that time, I found it difficult to use.

    But then I discovered alternative takes on modal editing, like Kakoune and Helix. Now I use Kakoune all the time. It also has a learning curve, but one that I am actually climbing. I get more effective with over time.

    So keep writing about modal editing, but don’t pretend it is the best and only solution.

    1. 4

      I also love modal editing. I also love pluralities & exploration, so I’d like to +1 that while vi/vim/nvi implies modality, modality does not imply vi/vim/nvi. My humble message for those who are unaware is that emacs lets you use & configure whatever modal system you’d like, including your own creations or variations. ‘Evil’ is a popular choice, since it essentially replicates vim (along with ex), but there’s also meow, a kakoune mode, god-mode, and who knows what else.

  2. 30

    JSON-LD may be everywhere, and search engines might use the info it provides, but this is still a long way off from the original promises of the semantic web.

    Wasn’t the original vision that the whole web would turn into a public global database where it would be easy to integrate data from different sources? That online data would be so structured that you could have autonomous agents navigating the information for you?

    In practice, any data that is even slightly dynamic, or structured, is not published directly on the web in machine-readable form. Instead, it is served as an API that you may or may not have access to. Beyond the use case of “I want to be indexed by a search engine”, I don’t think there has been any significant progress.

    Edit: I found this nice retrospective about the Semantic Web that goes into more detail into what has happened.

    1. 20

      To me, one of the big promises of the semantic web was to reach the logical conclusion of a separation of content and presentation. Rather than a web page being the interface, the web page would be a front end that connected to a service and many others would also be possible.

      In hindsight, this was obviously going to be killed by companies like Google that relied on ads. If the user agent is getting semantic markup and presenting it in the format that the user desires, what kind of user is going to choose to see ads? If you get an XML block that contains content and ads, you’ll render the content and drop the ads on the floor.

      All of the associated problems with ontology drift seem delightfully technology agnostic and are present in every attempt to build something like a semantic web.

      1. 5

        This sounds a bit like Activitypub, which is basically the social media version of this. With Activitypub, it’s basically like email++, where you send a stream of signed semantic data in the form of json-ld to different inboxes. Afterward these activities can be interpreted in different ways.

        1. 1

          ActivityPub definitely follows these principles. It lacks the self-describing and extensible things that the semantic web wanted to build on top. For example, Mastodon implements some APIs on top of ActivityPub, but you can’t generically point at an ActivityPub server, discover the things it can do, and get structured data. It’s very use-case specific. For example, it isn’t expressive enough to expose something like a flight-booking system in a way that would let generic things determine that this site is a flight-booking service, present the flight data in a generic structured-data view, and then book a flight, without using the web site’s UI.

          1. 1

            Yeah, I don’t think it could do that.

            Though since ActivityPub uses ActivityStreams, and they can be extended, you could in theory have a generic ActivityPub server (not mastodon) where you could send a stream of arbitrary activities (Annotated this website, marked an article as boring, Commented on XYZ’s blogpost) that then get forwarded to the correct recipients or someone’s public outbox.

          2. 1

            do you think a truly generic structured-data view would actually be usable for a wide variety of use cases? seems to me that the best interfaces would require some presentation information specific to the use case, either in the client or in the data sent from the server.

    2. 8

      The reasons for failure are entirely economic, not technical. Everyone wants humans to visit their sites and pay with their eyeballs. No one wants to share data to robots for free.

      The Semantic Web dream requires everyone to wake up one day and decide to be altruists.

      Maybe we can have LLM agents fight their way past captchas to do real scraping and integration work for you…

      1. 3

        That depends a lot on the kind of site. For example, if you’re an airline you’d have thought that making it easy for third-party agents to find the routes you fly, the availability, and book tickets, would be useful. Unfortunately, most airlines sell this kind of access, presumably because they make more money from selling the access than the lose from making it harder to buy flights.

        I’d have thought things like government sites would have a strong incentive to make access easier, but mostly this is done with different technologies (some are really annoying. The UK tax office decided to switch from XML to CSV for their exchange rate data and not convert the old ones).

        1. 1

          I really don’t know the economies of airline scheduling and booking, but the fact is you can quite easily visit a website that enables you to book your own flight from a number of airlines. As with price comparison websites, most airlines probably realize this is a good marketing channel, not a competitive net negative.

      2. 2

        The reasons for failure are entirely economic, not technical. Everyone wants humans to visit their sites and pay with their eyeballs. No one wants to share data to robots for free.

        As difficult (and unpopular around here) as it might be, I wonder whether this does admit a technical (and partly economic) solution involving sufficiently locked-down clients that can be attested to pay (whether via ads or via paid subscriptions).

        1. 2

          Well, there is an HTTP status code set side—“402 Payment Required”. I’m not sure if anyone has worked out a viable method though.

    3. 5

      I think semweb died for a bunch of reasons, but I think two big ones are:

      1. lots of people who have interesting datasets have financial or legal incentives not to share them
      2. there was rarely any business case for sharing datasets: no clear way that the publishing organisation will recoup the costs or benefit in any way from formatting and publishing the data

      I think huggingface, etc are some of the inheritors of this legacy: they provide hosting for loads of datasets and models, and are facilitators of research with those objects.

      1. 6

        Back when the idea of the semantic web was conceived, everyone thought that the future of the web would be fully decentralized. Everyone would have their own webserver and so would every business. When talking about data, it was not (always) about large interesting datasets. For instance, the opening and closing times of your local supermarket would be machine-readable, it would be available on their website. So your todo app and calendar could take that automatically into account when planning your day.

        The idea was this was a way for businesses to attract customers. You would opt for the business that your agents can read, not the one where you have to look up the times yourself. That was the incentive.

        But, as we learned, no-one wants to run their own server and their own website. Too much work. You can just outsource that to a big company like Meta by creating a Facebook page. Now all these small data fragments become one big dataset in the hands of that big company. And now there is money to be made with it, so they don’t want to share it.

  3. 34

    It has been eight years since left-pad first made people realize that, hey, maybe it’s not a great idea to outsource trivial functionality to random people on the internet.

    It has been 8 years and people have not yet realized the flaw of left-pad was that NPM allows the developer to completely remove packages from the package manager instead of only being allowed to mark it as yanked. I’m utterly shocked how difficult this is for anti-micropackage people to comprehend. Every single argument against micropackages can be addressed with a proper package manager that doesn’t allow completely pulling a package and allows the community to warn if a package has a security flaw in it, all things cargo does. I truly, honestly don’t think people who hate micropackages are even proper developers, or else they’d not only know how to solve these issues without throwing the baby out with the bath water, but also would be far more sympathetic to developers who are more or less just trying to get stuff done. I’ll at least give the author credit though, they state “should be copied into the source” instead of just outright saying it shouldn’t be used at all.

    It can not be stated enough, if you have a problem with micropackages, take it up with NPM, because most other things handle it sanely, unlike NPM, which I have even personally labeled as outright malware for how utterly horrible NPM is, even being responsible for some security flaws in the past. There is a certain point where negligence becomes criminal negligence, and NPM has long past it.

    1. 33

      While npm certainly has a lot of very special problems, it’s also obvious that if you copy-pasted left-pad into your project instead of using a package manager, you would have been totally unaffected by the left-pad fiasco.

      This is the fundamental point of the article. Third-party code can hurt you in ways first-party code cannot. Therefore, if the code is trivial, don’t expose yourself to that pain.

      1. 6

        I tend to agree with parent that the plethora ridiculously tiny “libraries” is a problem - realistically they’re an injustice to term “library”. Most of them could be reliably written in hand on the back of a napkin. However you’re also right that is a different problem than dependencies being yanked from upstream, which can happen for dependencies of any size.

        My solution to this is incredibly low-tech and simple: vendor your dependencies. Commit them to your repo. You will get automatic protection from them disappearing, and a number of other benefits.

        1. 3

          and a number of other benefits.

          And a huge number of drawbacks.

          1. 2

            Every decision in life is a tradeoff, and everyone will weigh each differently.

          2. 2

            Can you share what you have in mind? In some cases I find vendored dependencies worrisome, and have spent a lot of time over my career trying to provide the obvious benefits without triggering the drawbacks.

            But in the context of front-end javascript, I’m a huge fan of vendoring my dependencies, and once the work is done to be able to update those whenever I care to and to follow the licenses if I’m distributing the source code, I can’t think of a big pile of drawbacks in that context.

            1. 1

              Auditing, traceability, the ability to natively upgrade your dependencies. You give up many things by vendoring without writing your own tools to manage those dependencies and if you do write your own tools, then you’ve just ad-hoc created a package manager yourself. Not to mention the size of dependencies often are in conflict with your SCM and blow up your repo size.

              1. 1

                I know this is a week old, but I just noticed your reply here.

                Auditing, traceability, the ability to natively upgrade your dependencies

                I don’t know how it is for all languages but for languages like javascript where dependencies get pulled into a local directory by your package manager, “vendoring” just means including that directory in your VCS, rather than excluding it. It doesn’t impact the code that runs in any way - it’s just about whether it gets onto the disk from the VCS or from the package manager.

                Not to mention the size of dependencies often are in conflict with your SCM and blow up your repo size

                Microsoft famously (https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/?ref=blog.gitbutler.com) upstreamed patches to Git so they could use it as a practical VCS for the 300GB/3.5M file Windows repo.

                Look if your dependencies are somehow larger than the entirety of Windows, sure. Don’t vendor them. Realistically this sounds like an imagined/theoretical issue that won’t actually affect 99.9999% of projects.

    2. 24

      the flaw of left-pad was that NPM allows the developer to completely remove packages from the package manager … Every single argument against micropackages can be addressed with a proper package manager that doesn’t allow completely pulling a package and allows the community to warn if a package has a security flaw in it,

      That NPM allowed left-pad to be entirely yeeted out of the package ecosystem was certainly a critical issue. But I think that problem applies (applied?) equally to packages of any size, right?

      The problem with micro-packages is not so much about the mechanics of their management, I think it’s more about the impact they have on the cumulative risk of downstream projects that depend on them.

      1. 1

        [EDIT] Sorry was replying to the wrong comment, got both replies at the same time and wasn’t paying attention (the life of a multimonitor setup with a stream playing in the background)
        (Also lobsters should let you add a delete reason)

    3. 10

      Every single argument against micropackages can be addressed with a proper package manager

      This is certainly not true. Dependencies carry lots of costs and risks, package manager or no package manager.

      I truly, honestly don’t think people who hate micropackages are even proper developers

      Ah, gate keeping. Are these proper developers the ones who are “more or less just trying to get stuff done” as opposed to considering the longterm costs and risks of their actions to the code base?

    4. 9

      It has been 8 years and people have not yet realized the flaw of left-pad was that NPM allows the developer to completely remove packages from the package manager instead of only being allowed to mark it as yanked.

      You are basically saying that in addition to all the negative points mentioned, you can also have a bad package manager? So there are even more downsides than listed? Otherwise, it would be covered by the third, “the author may be overtly malicious”

      I truly, honestly don’t think people who hate micropackages are even proper developers, or else they’d not only know how to solve these issues without throwing the baby out with the bath water,

      I don’t “hate” them. I am just very sensitive to the downsides of dependencies, for all the points listed in the article, so I never use them. I honestly don’t understand why one would import something like isNumber or leftPad and why make oneself vulnerable like that, just to avoid typing a few lines of code. Even if you have the perfect package manager, it still makes many things slower and dependent on a working internet connection.

    5. 9

      I truly, honestly don’t think people who hate micropackages are even proper developers

      “If you don’t agree with my views you are not a proper developer”

      Couldn’t the same be said for the opposing view? People who like micropackages are not proper developers if they can’t implement these dependencies by themselves?

      I don’t agree with either view. This is a project specific decision. If you’re “more or less just trying to get stuff done”, then sure, go for it. For some, the bloat of having multiple versions of the same transitive dependency bundled in your binary has a huge cost (compile times, binary size, vulnerabilities etc). Cargo, for example, will use incompatible versions of the same library 0.

      I’m sure in both cases these are conscious decisions are made by “proper” developers. :-)

    6. 7

      I agree with your point, but I guess you didn’t realize that npm immediately addressed this issue, since it sounds like you’re assuming left-pad is still possible on npm: you can no longer unpublish a package except under specific circumstances, and this policy has prevented left-pad recurrences:

      https://docs.npmjs.com/policies/unpublish

      https://stackoverflow.com/a/69420293

      1. 5

        I know they addressed the issue, but thank you for telling me, regardless. I remember being slightly (not completely, but at least a little bit) offended they explicitly broke their own rules and went against the developer’s wishes (however psychotic they were) to re-publish the package. This scenario should not have been possible in the first place, and thanks to NPM’s negligence, not only did a travesty happen, but they also had to violate someone’s wishes to remedy it. It’s an especially dark page in computing history that I am not happy about in the slightest.

    7. 5

      It’s quite frustrating that I see blogs nearly a decade later making this mistake. It is not an unknown thing that this criticism is at best confused, and at worst completely invalid. People have been explaining this over and over and over again.

      It takes very little research into the incident to realize what the root cause is - not that a package was small, but how NPM handled its removal. This means that leftpad is actually not a great example of “small packages are bad” unless you are willing to talk about the incident as a whole and then justify that the fact that leftpad was small uniquely contributed to the incident in a major way. I’ve never seen a blog do this, although sometimes I’ve seen a comment sort of do this.

      1. 3

        OP here. See my reply to the top comment. I am fully aware that the root cause of left-pad was the package being removed from npm, and this does not diminish my point. Take a step back from the technical details and look at the high-level facts: people chose to depend on a trivial micro-library, the author did something bad, and suddenly everyone suffered.

        1. 4

          My point is that your post would be better served without even mentioning leftpad since leftpad in no way supports your points and only confuses things.

          I’m fine with suggesting that ~5 lines of code doesn’t need a package. In fact, I’d support that.

          1. 5

            Left pad is a natural result of micro-libraries. Your exposure to random one-person-maintained libraries explodes in a microlibrary ecosystem. Trusting more people adds risk since any one of them could be compromised (or jerks).

    8. 4

      Where are all these developers that think npm is good or even acceptable? Granted I don’t do much frontend work, but I haven’t talked to one in at least five years. Are they all just … hanging out on HN and Twitter I guess?

      1. 10

        It’s not that these developers think NPM is good, but I feel like their only experience with language package managers is with NPM, so they assume it’s the norm. They need to understand this is not the norm, and in fact most other languages don’t have to deal with anywhere near the amount of problems NPM presents. NPM gives so many people such a pessimistic outlook on the entire developer ecosystem, which is why I so passionately made it apparent that NPM is causing criminal damage, because it is.

        1. 3

          I love how the Js ecosystem is so messed up there’s exists a tool, corepack, to manage your package managers.

          1. 3

            I love how the Js ecosystem is so messed up there’s exists a tool, corepack, to manage your package managers.

            You’ll love this then https://socket.dev/blog/node-js-takes-steps-towards-removing-corepack

            1. 6

              Not to worry. Someone will surely implement a tool to handle whether corepack is present or not. That way we can have a manager manager manager.

      2. 5

        How is npm any different from any other package manager? Say pip, cargo, conan/vcpkg, apt, yum, etc…?

        One should not confuse the technology, the repository and those who abuse it. No other package manager can prevent the existence of micro-package. All you can do to prevent it is to raise the barrier of entry. For apt/yum/etc… the barrier of entry is getting your package accepted in the distro’s official repository, which is not done with a single command and require human interaction.

        1. 7

          Package managers for language-level packages (pip, cargo) are categorically different than package managers for system-level packages (apt, yum). Python packages bear little to no resemblance to Debian packages.

          1. 4

            Only their implementation and how their repositories are managed differ. From a high-level point of view, they solve the same problem, and they are subject to the same risks (micro packages, supply chain attacks, …).

            System level package managers have a higher barrier to entry because packages must be curated/audited/validated before being accepted in the official repository. But that has nothing to do with the technology itself. We could have such a barrier of entry for language-level package managers.

        2. 5

          How is npm any different from any other package manager? Say pip, cargo, conan/vcpkg, apt, yum, etc…?

          It’s much slower than it has any good reason to be, and its handling of lockfiles has had many serious bugs if you run on multiple OSs / CPU architectures (although I haven’t run into that as recently).

          1. 4

            My point was at a higher-level. NPM (the package manager) is not responsible for the presence of micro-packages on NPM (the repository). The ones at fault are the ones who granted publishing rights to absolutely every random person on the internet. And every single language with a package manager and a public, non curated, non audited, package repository, will eventually suffer from the same problem, and should suffer the same critics.

            Obviously, there are technical differences between each implementation of the concept of “package manager”, but that was beyond my original point.

        3. 3

          You can’t just delete a crate from cargo. It can’t just disappear. Yanked crates can still be accessed, they just won’t be accessed unless they’re already referenced in the lockfile.

          https://doc.rust-lang.org/cargo/commands/cargo-yank.html

          It’s possible for the same thing to happen if a DCMA or other legal requirement to remove the crate is issued. But there’s a native “not breaking” way to do this.

          https://forge.rust-lang.org/crates-io/crate-removal.html

          With leftpad the author was able to just remove all of their packages. That just isn’t possible with crates.io.

        4. 2

          How is npm any different from any other package manager? Say pip, cargo, conan/vcpkg, apt, yum, etc…?

          I haven’t used any of the ones you’ve listed except for apt, so I’ll answer for that.

          The main difference in design is that apt is intended for end users, not developers. Because of this, there are a lot of things that work very differently; the main one being that normally only a single version of any given package can be installed at a time; if you want two versions to be installed at the same time, they must be treated as separate packages. This would be intolerably tedious for libraries during development, but 99% of the time it’s what you want for applications as a user. The other main difference is that packages in apt have quality control measures applied to them; they must follow a strict policy which has been carefully designed to keep the system from breaking, because when things break, it directly affects end users! It is a lot more work to package things with apt, (especially in Debian, which has stricter rules than other variants like Ubuntu) but when you do the work, it basically never breaks on the stable release.

          1. 1

            What prevents the maintainers of NPM/Pip/Cargo/… repositories to implement the same strict rules to accept packages?

            The real problem is “central open-bar repositories where everyone can submit their own trash”. Curation is key to quality.

            1. 2

              Well, if npm changed their policies, it would cause a huge uproar because it would create a bunch of extra work for people who today do nothing.

              But sure, imagine they were starting from scratch. Who’s going to write the policy? Who’s going to do the actual work of packaging? In Debian you have the packager role which is distinct from the upstream developer; someone who’s an expert in integrating a given component into a larger whole and understands the context.

              The policy of Debian is centered around a release that only has to work on one operating system and one set of library versions. Even with this drastic cut in scope, (vs npm which will run on any OS with any set of other libraries that might conflict with it) it’s still a ton of work to ensure all the components work correctly with the other ones. That’s why it takes many years for a Debian release to happen. What’s the most common complaint people make about Debian packages? They always say “they’re too out of date”, and the implication is that this is because Debian packagers are lazy, but the truth is that getting something that actually works just takes a lot of time!

              But anyway, even if you did all this work and set this all up, probably JS developers wouldn’t actually use it, because they care more about having the absolute latest version than having things actually work. These are the kinds of people who think that npm is a totally fine system today.

    9. 3

      I truly, honestly don’t think people who hate micropackages are even proper developers

      Hot take. One thing your comment does not address is the waste induced by “micro-dependencies”. Rust and Node-based projects are, in my experience, often the largest in terms of filesystem size due to the number of transitive dependencies. Hopefully you do not consider concern about waste (and a desire to minimize it) to make one an “improper” developer.

    10. 1

      It can not be stated enough, if you have a problem with micropackages, take it up with NPM, because most other things handle it sanely,

      I can’t even think of any microlibraries in the Rust ecosystem, honestly. If we use the metric people are using in this thread of such libraries being those you could code yourself in 3 minutes.

  4. 56

    It makes me sad to see “Accessibility features” as a planned feature. Accessibility is something that should be built into the design and the code interfaces: elements should be scalable, screen-read-able, navigable with a variety of input devices, internationalisable, along with other considerations. It becomes hard to tack accessibility on to an existing project or codebase if it’s not designed with accessibility in mind. There isn’t much documentation so it’s hard to gauge how the project will handle it.

    1. 19

      You generally want feedback on more foundational elements before adding the surface area of accessibility. Further, if accessibility is that critical to the user it probably isn’t advisable for them to use an alpha-grade desktop environment release even if there was an intent to provide accessibility in that release.

      1. 38

        You generally want feedback on more foundational elements before adding the surface area of accessibility.

        I think accessibility ought to be considered a foundational element.

        1. 1

          I suspect you don’t mean, “accessibility features should not have any dependencies”–that accessibility should not be built upon the same components that are used for “non-accessibility” features, which is what the parent is arguing. Presumably you mean that the core components should not be released or iterated on until the accessibility features which build upon them are ready?

          1. 7

            I can’t speak for the original poster, but I took it to mean “dependencies should consider accessibility in design”.

          2. 6

            Presumably you mean that the core components should not be released or iterated on until the accessibility features which build upon them are ready?

            I mean, as ~jmelesky points out, that accessibility should be a fundamental part of the design of those core components. It’s not that accessibility should be a ticket that blocks release, it’s that at every stage of readiness accessibility should be built-in at the same degree of readiness, just as “windows can be relocated on the screen” is a fundamental design principle of the system.

            As used in this sense, “accessibility” is a pretty big term which could mean anything from “text color contrast can be set explicitly” to “textual content is available to screen readers” (itself a big topic). But it’s the same with window relocation. Windows can be floating or docked or split across screens or sticky, and so you have to design a system that has the capability to meet those requirements without radical change. If you don’t start from day 1 with a recognition that a window could be on two screens, when you get to HiDPI support you’re going to have a bad time.

    2. 18

      Accessibility is something that should be built into the design and the code interfaces:

      Seems like that’s what is happening? They are running a fork of upstream GUI lib with accesskit patched in

      https://github.com/iced-rs/iced/issues/552#issuecomment-2180837663

    3. 44

      It makes me sad to see people shitting on every single fresh graphical thing anyone makes because of this.

      I bet it is hard to tack on. I bet it’s also pretty hard to get an alpha release of a DE out and also have all the elements be be scalable, screen-read-able, navigable with a variety of input devices, internationalisable, along with other considerations.

      1. 38

        It makes me sad to see people shitting on every single fresh graphical thing anyone makes because of this.

        It is bad that excluding blind and low vision people is the norm. Frankly, it’s one of the worst things about Linux on the desktop today. Social pressure is an important part of changing that.

        Remember, disability is the one minority you can join at any moment.

        1. 13

          I don’t think anyone is proposing excluding blind or low vision people. In fact TFA states that the design for accessibility is in progress. So we’re talking about whether accessibility needs to be part of the alpha release or not.

          And if you want to advocate for earlier consideration of accessibility features, there are certainly more compelling approaches besides aggressive negativity (e.g., “shitting on” others’ work). Cynically, it often feels like aggressive negativity is precisely the point on the Internet, and we just rationalize post-facto some noble reason to justify said negativity. This feels like a social media pathology that we need to address.

          1. 4

            I find it very intellectually dishonest to look at a criticism you disagree with and dismiss it as a post-facto rationalization of a desire to be negative at any cost in the absence of any evidence except some nebulous “social media pathology.”

            1. 6

              I agree that would be very dishonest, but to be clear I wasn’t dismissing your comment (which I addressed in my first paragraph). My remarks about social media pathology were in response to your advocacy for using “social pressure” (which given the context of the thread seemed to be a euphemism for aggressive negativity). Specifically that as a tactic, it is my belief that it doesn’t work very well and more frequently it seems to be deployed to cruel ends. I’m not suggesting you are being cruel or advocating cruelty, but rather cruelty may be an unintended consequence of advocating “social pressure”.

              1. 3

                Ah, understood, thank you for clarifying.

        2. 20

          Should I go blind, I hope System76 will be kind enough to offer me a full refund for Cosmic.

          Being blind surely sucks. It’s one more thing that sucks about it. It’s not System76’s job to solve the everyone in the world’s problems. It’s not like I don’t have empathy for blind people, but it costs a lot of time and money to build a DE, and the existing DEs for linux suck even for sighted people. I applaud the attempt to improve the status quo, even if it might not help blind people for a while (or ever).

          If you say people shouldn’t even start such projects until they’ve got a plan for making them usable by blind people, you may just never see such projects started at all.

          1. 20

            It’s very difficult for me to read this post as anything other than, in essence, “As long as I get what I want, I don’t care whether or not blind and low vision people can share in that success.”

            If you say people shouldn’t even start such projects until they’ve got a plan for making them usable by blind people, you may just never see such projects started at all.

            It is fine for people to start projects with or without support for accessibility tools, but I’m certainly not going to invest my time or energy into learning or contributing to something that a) doesn’t work for a fairly large group of free desktop users and b) wouldn’t work for me if I were injured or if my vision loss continued to progress.

            You’re acting like the disabled folks on here are trying to shut down development of this project or something. We’re just saying that it doesn’t work for people like us and that we don’t like that; it’s no different from someone saying they wish a new editor had a Vim emulation mode, except Vimmers can learn chorded editors, but I can’t learn to see better.

      2. 23

        Why do we tolerate posts like this? Excluding community members like they’re simply worthless. I got pretty much the same negative reply when I said I was sad that a stripped down Qt decided to dump all of the accessibility features.

        One day you’ll have a disability. Or someone in your family. If you live long enough, it is inevitable. It would be good to care about other people before it comes just from self interest.

        You can’t bolt this on. Look at Wayland. Even today it’s a massive step backward in accessibility compared X11. It’s unlivable.

        I hope that one day we’ll get to the point where posts like yours are treated like any other kind of unacceptable vile discrimination against a marginalized group.

        1. 37

          Not having a set of features there in an alpha release is not exclusion. If they’re not there in the actual release, then it’ll be exclusion. Right now it’s just not doing everything at the same time. Can we wait a while before deciding they’re bad people?

          Calling it vile discrimination when someone points out that there’s a bunch of engineering decisions that need to be made in a large project is a wild overreaction.

          1. 19

            The problem with postponing accessibility features is that, much like its far more glamorous cousin, security, accessibility is very hard to bolt on top of a codebase that wasn’t built for it. Postponing it for the final release is an engineering decision just like postponing authentication and access control for the final release is an engineering decision. Yes, you can’t do everything at once, but some things you can only do in the beginning.

            It’s particularly problematic for a desktop environment, which depends on a healthy base of third-party applications (System76 won’t be able to literally write every app one uses, or integrate all the applications out there), so you depend on third-party developers following a set of conventions. If it doesn’t exist, or if it changes too often, the few people who follow them will quickly fall behind.

            It’s not like we haven’t seen this in the Linux desktop space, we’ve got like 25 years of lessons to go by.

            Not having it in the alpha release practically guarantees that, even if it’ll eventually make it into a product roadmap, it will be so bad and/or require so much maintenance and constant integration work it might as well not be present at all.

            Also, hyperbolae aside, it’s worth remembering that System76 is a private company, they’re not a loose group of FOSS hackers. Engineering decisions are subordinated to company policy, not the other way around, they literally have product roadmaps and can tell people what to work on. In System76’s roadmap, docks that either float or extend to edges or wallpaper cycling simply were more important than accessibility features.

            1. 7

              Who says they need to start from zero when they are going to implement the feature? It could very well be that a lot of groundwork is already present, but that it is just not enabled yet.

              It says: “Other incomplete but planned features”. So incomplete, not absent.

              1. 4

                Cosmic uses ice, which does have some limited cross-platform accessibility support, and System76 is the main (and AFAIK the only) contributor to the Linux implementation, based on AccessKit (which is, itself, cross-platform, but was Windows-only until about a year and a half ago IIRC). They are pretty much building it from scratch, they’re the ones laying the groundwork :-).

                1. 13

                  Hi, lead AccessKit developer here. Yes, AccessKit was Windows-only until late 2022. The work to implement an AccessKit backend for free desktops, by implementing AT-SPI, has been done primarily by the other main AccessKit developer, Arnold Loubriat. I’m happy that the Cosmic team is building on our work to implement accessibility in their fork of iced, though I regret that I haven’t yet taken time to review their work.

            2. 4

              Oh, yeah, it’s absolutely possible it’s a bad engineering decision they’ll come to regret, but I’d still reserve judgement on their character until they’re actually saying it’s 1.0.

        2. 20

          Excluding community members like they’re simply worthless.

          That seems a little excessive. A more charitable interpretation is “not enough time”, or “don’t know how”, or pretty much anything else.

          As someone doing Open Source for 25+ years, I’ve seen this a lot. I’ve been labelled a terrible person for not writing enough documentation. Or for not translating the documentation to Serbian or Xhosa. At the same time, there’s zero expectation that the people complaining should contribute anything.

          And that’s the issue for people who get things done. When we give away free software, we don’t owe users anything. It is (I believe) entirely immoral for users to sit there, doing nothing, all the whole shitting on the people who do the actual work. It’s immoral to complain when you get something for free. It’s immoral to demand that others work for you, for free. It’s immoral to put down people for giving away free things.

          The comment I quoted above is trying to avoid the negative mistake of not including people. However, that comment is making the positive mistake of treating contributors badly.

          My message for 25+ years has been: “You’re complaining? Great! Submit a patch!”. The #1 response to that request is for people to run away. The #2 response is for people to tell me I’m an asshole for asking them to do things. The #3 response is “I don’t know how”.

          I’m a little mystified as to how people expect to learn anything if they’re not going to try. I’m resigned to the fact that most people would rather complain / demand / curse at me, rather than contribute.

          So I’m firmly on the side of “don’t look a gift horse in the mouth”.

        3. 16

          Do you have similarly strong feelings towards software that is not fully localized to all languages on its early development stages? Or, towards software that is only targeted to programmers? Because, there are many more non-English speakers and non-programmers than visually impaired people in this world.

          1. 13

            Do you have similarly strong feelings towards software that is not fully localized to all languages on its early development stages?

            I don’t think that analogy holds up. There are two cases:

            • You didn’t localise the application.
            • You shipped a GUI toolkit that doesn’t allow localisation.

            The equivalent is:

            • You didn’t include a screen reader.
            • You didn’t include any hooks that a screen reader could use.

            In both cases, the former is fine because you may not have the expertise to fix it well, but the latter is not because you’ve decided early on in your design to exclude people.

            1. 7

              It’s alpha software. I agree that it would be preferable for a desktop environment to have good screen reading integration (to not exclude visually impaired people), good localization (to not exclude non-English speakers), good and polished visual design (to not exclude people with high aesthetic sensibilities), extremely good performance (to not exclude people who can’t stand slow software), very low hardware requirements (to not exclude people who have very old computers), etc…

              But again, it’s alpha software. You’ve got to release something at some point. And iterate. And it’s never going to be perfect. And that’s fine.

              I don’t really see how it could be constructive to criticize a project released as OSS and treat it as “vile discrimination” just for not having one of the features above (and therefore exclude some people) during its alpha stage. I think all software will exclude some cohort of people one way or the other. That’s one of the nice things about having a healthy and diverse OSS culture: different people will make software for different needs, so we can have options.

              We can choose to contribute to projects that do have the priorities we like, or donate to them, or at least promote them online. I think that’s a more constructive approach than publicly accusing the projects we don’t like of immoral things.

          2. 9

            I’m a non-English speaker who learned to use computers before being fluent in English by any measure, so I have first-hand experience when saying this is a completely false equivalence. Both in technical terms, as @david_chisnall mentioned here, and in non-technical terms.

            The only thing standing between someone who doesn’t speak English and using a computer for basic productivity tasks is knowing some English words. Where I’m from, formal English literacy is basically zero for people up to my generation and they use computers, phones and tablets just fine, even though localisation is still quite infrequent (and was also practically zero up until ten years ago or so, and nowadays it’s usually so bad you’re better off using the English version anyway). Even people from my generation (we were the first ones who could easily learn English, but even for people my age access wasn’t exactly universal) largely learned to use computers going on whatever English they’d picked up from watching Hollywood movies.

            Advanced computer use certainly requires a higher level of proficiency, and that, in turn, certainly requires learning time that not everyone has, access to learning resources that not everyone can afford. Yes, that’s a problem.

            But if you are blind, no amount of personal effort or learning will help you see again, and if you can’t do something, no one will be able to help you over the phone. Worst-case scenario if you don’t know English is something takes longer because you need to grab a dictionary. Worst-case scenario if you can’t see well is anguishing over which one of your friends to bother for help this time because you have no idea what to do, no way to figure it out, and no way to do it even if you knew.

            Regardless of how you feel about accessibility or localisation, these two things are on completely different levels.

        4. 14

          Why do we tolerate posts like this? Excluding community members like they’re simply worthless.

          Let’s not become overly emotional. Nobody here said or even implied that disabled people are worthless. Adding accessibility can easily double the scope of a project and it benefits a low number of people. Unfortunately, software development is subject to reality, which means that there are constraints to cost and time. To get a prototype or alpha version out of the door, it is not only reasonable but inevitable that corners are cut. If we hold the first release of every application to the highest accessibility standards, it will be impossible to create and ship new software because implementing accessibility features is very hard and time-consuming. I’ve seen entire projects die because of less complicated features that could not be realized.

      3. 2

        System76 are not just anyone, it’s a business that makes and sells GNU/Linux desktop and laptop computers that’s pivoting from GNOME to it’s own in-house DE. If they were a rag-tag group of hardy, well-meaning engineers maybe things would be different, but for a product designed by business you have to start expecting the right thing out of the gate. @matklad mentioned their use of AccessKit from this GitHub issue, so maybe it’s not all doom and gloom, but the sparse documentation led me to find nothing else on the subject.

        1. 10

          It sounds very much not all doom and gloom:

          We’re helping to build the iced Rust GUI library. This library is used for creating COSMIC apps and inside cosmic-comp (the COSMIC compositor) for shell functionality. Part of our engineering for iced is to add the toolkit’s first accessibility infrastructure, in order to ensure that COSMIC is accessible to all.

          AccessKit, a UI cross-platform accessibility infrastructure project, has been growing in popularity and is also written in Rust. We built an iced and AccessKit proof-of-concept that worked quite well, and our iced Request for Comment (RFC) for AccessKit integration has made the official roadmap!

          The Orca screenreader is now working in COSMIC, but we’re still working on adding important functionality like keyboard control. This is a more complex issue than it might sound, due to Wayland limiting keystroke access to each application separately for improved security. We’ll be exploring the libei emulated input library to gain insight into the current state of this potential method to enable additional screenreader functionality in COSMIC.

          source: May 11 2023 post on the System76 blog

          So it sounds like “incomplete but planned” means that work on the necessary infrastructure is underway. That doesn’t strike me as wholly unreasonable for an alpha of an ambitious project from a relatively small team.

        2. 5

          Just because an organization is a business doesn’t mean they suddenly have access to all of the engineer time in the world to hit all of the features that they need in an alpha release.

    4. 2

      Also if AI-assisted UI goes anywhere, its point of entrance will probably be through accessibility features.

      Though whether that means they should start out with those features in place, or wait and do them right, I don’t know; that can be a complicated engineering and architecture question.

  5. 50

    Something else that is killing the web and that needs to go: advertising. Seriously. We’ve been discussing the progressive enshitification of the web for decades, and it’s all driven by advertising. An early discussion of this was in The Anatomy of a Large-Scale Hypertextual Web Search Engine, 1998, by the founders of Google. Now of course Google is public enemy #1 in this regard. A more recent post:

    1. 19

      I don’t disagree… but I also think it’s worth pointing out that it’s a particular style of advertisement, designed and driven by the adtech industry. We’ve had advertising basically since the dawn of mass media… but it’s only in the last 20 years that we’ve drilled into advertisers’ brains the idea that it can’t be effective unless they sell all their ads through a middleman platform that tracks every detail of who sees it.

      1. 6

        Web advertising is what used to be called direct marketing https://lobste.rs/s/syvng1/google_gets_its_way_bakes_user_tracking_ad#c_gause8

      2. 2

        It’s more like the last ninety years, ever since “propaganda” was renamed to “public relations.” You might be interested to learn about Edward Bernays and his part in this transformation.

      3. 2

        This is very true! And we had healthy ecosystems for TV, billboards, magazines, newspaper, and radio before user tracking. It’s a disruption for the worse because it kills off the content providers that it is supposed to be supporting.

    2. 11

      I’d say that the problem is ads targeted at the reader, not ads targeted at the content. The former leads to tracking, the latter only needs to analyse the content if the URL. But due to SPAs and infinite scroll and per-user generated feeds it’s no longer possible to analyse the contents of many URLs, so ”everyone” has switched to user tracking ads.

      Edit: the real solution is to disallow any connections to anything expect the origin for any type of resource - no third party anything, or even connections to other subdomains of the origin.

    3. 8

      Are ads the problem or is it cross-site user tracking? I would argue the latter.

      I think we need an iterative approach. We could all be fine if they weren’t so privacy intrusive. Then we tackle annoyances like autoplay, overlays etc.

    4. 5

      I think anyone who wants to get rid of advertising on the web needs to explain how funding is going to work. I can set up a blog today, for free, because of ads.

      1. 3

        I think anyone who wants to get rid of advertising on the web needs to explain how funding is going to work. I can set up a blog today, for free, because of ads.

        Well, a big part of the solution is to just get rid of a lot of stuff that is rubbish anyway.

        But regarding setting up a blog: before big tech ruined everything, it was custom that internet providers offered a total package. Not just connectivity, but also email and hosting for home pages. I don’t see why this model could not work. Throw in an ActivityPub server and you have social media too.

        1. 2

          Well, again, that’s a funding model. Maybe that works for blogs. What about video content? The point isn’t to come up with one off ideas and then discuss individual funding models, I don’t really want to do that, the point is that there’s a massive amount of content today that is subsidized or outright paid for by advertising and it’s on the onus of those saying we should remove that funding to explain how that will impact that content.

          1. 3

            That sounds like you are asking how we can ‘afford’ the internet as we know it right now, but only with all the advertising removed. But I think is we remove the ad funding, we would/could instead grow a different internet that is not as centralized and as toxic as the current one. It would look and feel different and probably better.

            1. 1

              That’s a reasonable opinion to hold but I do think it warrants justification. This is the internet we’re talking about, removing its major source of funding is an important think to get right. I’d expect this to be a topic of serious research to understand the impacts, not just “let’s remove ads because it’ll be better”.

      2. 2

        You can setup a blog for free without ads, thanks to the bajillion static site hosting services.

        1. 3

          Are any of those free without being subsidized? ie: Github lets you host a static site, but that’s paid for by the other users who buy other unrelated features.

          Is that a better way for monetizing? Is it better for a “free” internet? I’m not convinced that it is and I think this is a complicated issue that needs to be understood and tackled by anyone who advocates for the elimination of a major source of funding for the internet.

      3. 2

        Which blog platforms have ads by default?

        1. 3

          Wordpress, at least when I used to have a blog.

    5. 3

      Your linked article is a bit depressing yet weirdly funny. I knew that Google made people add slop to their website, but didn’t realize how bad it could be.

      Now that AI can generate content that seems fine to an algorithm, it seems to me their business model of using heuristics to rank content is being disrupted pretty hard, as they now have trouble distinguishing the rarefied wheat from the overabundant chaff. It’s pretty clear they can’t tell how garbage the content you added is. There would be something almost karmic for a company relying on algorithmic content rating to be destroyed by algorithmic content generation.

      Are we going to go back to curate content with actual humans? Would be funny to find out Yahoo was right all along.

  6. 5

    I love it! I wonder if there are still people embedding things, secrets, into files just for the sake of it.

    I recall the hidden messages in the bitcoin chains, are there recent equivalents?

    1. 5

      Somewhere else, someone suggested that the embedding often probably was not deliberate, but due to a common user error on Windows where you double-click on a folder to open it, but accidentally move the mouse a bit and copy the folder into another.

      But yeah, wonderful post!

    2. 2

      Not files, but PCBs of less common devices sometimes have fun phrases or pictures on them.

      1. 2

        Integrated chips have drawings in them sometimes called “chip art” that you’d never see unless you removed the outer hull of the IC.

        https://en.wikipedia.org/wiki/Chip_art

        Some more: https://www.wired.com/2011/04/gallery-silicon-art/

  7. 2

    I did not dive deep into it, so just a quick note: the series of if statements in the example on the front page can be written more concise and more to the point with a switch. Only one expression will be true. Also, there are predefined constants for the HTTP Methods:

    switch {
      case second == "" && third == "" && c.Method == http.MethodGet:
        handleProject(c)
      case "second" == "new" && third == "" && c.Method == http.MethodGet:
        handleProjectNew(c):
      ...
    }
    
    1. 2

      I loath swtich statements :)

      But hey go for it if you like them!

      I’m sorry is “GET” bad because someday http.MethodGet will change? That sounds like OCD for the sake of OCD?

      1. 1

        It is not “bad”. I personally like the idea that if something does not work, I don’t have to worry that I accidentally misspelled it as “Get”, or “ GET” somewhere. If you make a typo in the constant, the compiler will yell at you.

        But I am curious, what is wrong with switch statements?

        1. 1

          fair point on Get vs GET. It’s just never happened to me. I fix bugs I type all the time hehe just never that one.

          https://chatgpt.com/share/34d29ed6-2e8f-4134-b7d5-c1fa5f46ce79

          Whenever I see a switch I usually end up re-writing as if’s just to get my mind around it. tomato tomAto? Maybe some people rewrite things as switch for their brains.

          1. 5

            using chatgpt to back up an argument is like sending a google search link to back up an argument. that LLM will spit out the positive of whatever you want - here’s what i mean: https://chatgpt.com/share/6a7b70b2-b7bd-4b80-bfd3-c83b7f35d05a

          2. 1

            I am definitely one of those people. To me, there is a difference between a series of ifs and a switch. In a switch, you want to pick one particular path. It is this or that, or that. If’s present multiple optional paths that can be combined freely. Maybe this, and/or maybe that, and/or maybe that.

            Using a switch signals clearly that only one of the presented options will be used. But I agree that it is very much a personal taste.

  8. 45

    I didn’t post this because I thought that it would probably be marked as off topic but, in spite of the clickbaity title, the paper is very well written and argued and should be required reading for any manager thinking about adding ‘AI’ to their products.

    1. 17

      I agree. I wanted to share it because it’s a quite nice write up, without being too targeted at “tech people” and written in a rather neutral and introductory way.

      Also I wished there was a better word for bullshit that describes things being made sound grand or smart, yet meaningless, and “hot air”, so words without content.

      It’s actually a skill so in that sense I get where the human-like comes from. It’s something that companies might actively seek, it’s presentation for the sake of presenting - as opposed to for example bring information or entertainment to people. It’s also integral in advertising oneself.

      And that’s something that people seem to be invested in and confronted with a lot. This makes me wonder if the rise of LLMs also has to do with how widespread a work/professional culture is nowadays where “bullshit” is sought after. So if your job is to create bullshit, as in for example nice nothing-saying power point presentations, about stuff all the listeners already know about, and that cannot go into detail too much, because of the audience, of course something that does exactly that is great.

      I think how widespread “please summarize this text for me” is also a symptom of this. With content being dug in many grand words/bullshit it might be hard to extract. In extension I wonder if this will mean that adding BS to text will in future be done by an LLM, just so that on the receiving side it can be removed again by another (or the same) LLM. And it makes mes wonder about the cost/waste of resources associated with this process. Further resources will probably need to be used to quantify this.

      To be fair this isn’t completely new. Putting rhetoric to work to make something sound nicer and having people to do that, so putting resources into it isn’t LLMs fault. Creating and asking for adding BS has a long history. And I think it’s only fair that you don’t need rich anymore to have that done for you. The same is true for producing fake images. But that’s a whole other topic.

      1. 12

        So if your job is to create bullshit, as in for example nice nothing-saying power point presentations, about stuff all the listeners already know about, and that cannot go into detail too much, because of the audience, of course something that does exactly that is great.

        In Robert Jackall’s 1988 book Moral Mazes, he makes a case for vague corporate speak: it’s a tool to hedge your bets. Instead of writing clearly, you use language that can be understood in multiple different ways. That way you have your back covered if the proverbial hits the fan in the future, because you can always argue “I certainly didn’t mean that!

    2. 14

      In a sense, the paper falls in the same trap as those managers by anthropomorphizing the LLM’s.

      A random number generator also produces output that is designed to be convincing, in this case numbers that appear random, while in reality they are not random at all. However, when the generator ‘fails’ and produces numbers that don’t look random, no-one makes the mistake of saying it hallucinates. Nor does one call those numbers bullshit. It is simply not a good generator.

      As soon as you start talking about truth, goals and reasoning, you have already lost.

    3. 3

      the paper is very well written and argued

      It’s well written but not well argued. It doesn’t mention “fine tuning” at all, despite that being the step that (partially) points the model toward the truth!

      The process that generated the text on the Internet that GPT-4 was trained on is spectacularly complex, but contains a number of patterns, such as:

      • The tone of celebrity gossip articles
      • Html escaping
      • The sorts of lies that people tend to tell (not to be confused with arbitrary falsehoods)
      • The drivel that precedes the recipe in recipe pages
      • Actual true statements
      • How people write when they’re drunk
      • How to be obsequiousness (overly polite and agreeable)

      All these things are tangled up in practice. I’m sure there are plenty of examples of html-escaped celebrity gossip written by drunk people. But the most efficient way to store a predictive model of this tangled mess, is for the LLM to recognize these patterns and separate them out.

      And they obviously can! If you’ve played with a non-finetuned LLM, you’ll know that if you ask it to continue some html-escaped drunken prose about 17 purple ninjas, it will continue in that vein, and if ask it to continue some html-escaped sober prose about 17 purple ninjas, it will do likewise. That’s how you know it has “learned” to distinguish drunken prose vs. sober prose.

      Likewise, if you ask it to continue a list of lies, it will extend the list with more lies, and if you ask it to continue a list of true things, it will extend the list with more true things (with high probability). So it obviously has some internal notation of “what’s true”. Remember that I’m still talking about non-finetuned LLMs.

      Fine-tuning then “points” these LLMs towards particular types of outputs. Chat-GPT was fine-tuned toward being obsequious: overly polite, never disagreeing, never being offensive, never telling obvious lies.

      The fact that Chat-GPT often says false things is from a combination of:

      • It wasn’t fine-tuned to tell the truth! I don’t know exactly what the fine-tuning process was, but it involved low-wage workers and didn’t involve doing deep research every time it said something that sounded plausible to check to see whether that thing was actually true!
      • It’s internal notion of “true stuff” is only so accurate. The world’s complicated, and the Internet’s a shitty place to learn from.
      • It’s notion of “now” is (understandably) all jumbled up.

      Fine tuning is an important part of all of this; you can’t just ignore it.

      1. 20

        Sorry to reply to such an elaborated comment with something short, but fine-tuning doesn’t mean anything with regards to the point. You could have an LLM so fine-tuned that it would only output sentences that if output by a human would be truthful and correct - the output of the LLM is in that case indistinguishable from truth, but it was output without a concern for truthiness or falsity, without concern for correctness or incorrectness, without concern for accuracy or precision. When you ask it for “true” things, the output is true only coincidentally. The machine used no cognition or metacognition to produce the output. The output was generated purely mathematically.

        1. 3

          The machine used no cognition or metacognition to produce the output.

          First of all, how do you know? The internal workings of LLMs are extremely opaque to us right now. We only know what’s going on at a very low level. And second: what does it matter, if the result could be something that reasons correctly?

          1. 15

            First, because not relying on semi-metaphysical properties like cognition is a good example of Occam’s razor in this area. And second, LLMs do not “reason” in any way like a human being. The countless examples of them being tripped up by logic puzzles that can be solved by a child is proof of that.

            1. 2

              The countless examples of them being tripped up by logic puzzles that can be solved by a child is proof of that.

              No it isn’t. It’s proof that their cognitive capabilities are limited. That’s it. You wouldn’t say that a baby has no cognition, or a young child, just because they are incapable of solving some problems.

              From a functionalist approach we can argue that an LLM has cognitive capabilities because it appears to have them - we can see that an LLM can “remember” things, that its correlation capabilities appear semantic, etc. You may think that a human “cares” about the truth but from a functionalist perspective if I tell an LLM “tell the truth” and it is able to, using correlations and dimensionality, produce more true statements, that is the same thing. This in no way relies on metaphysical cognition, which I do not believe in personally.

              Moreover, LLMs have demonstrated abilities in pattern recognition, language understanding, and generating coherent text, all of which are cognitive functions. These abilities might be rudimentary compared to humans but still indicate a level of cognition. This view doesn’t rely on metaphysical cognition, which I personally don’t endorse, but rather on observable and functional outputs.

              Emergentism and Connectionism (I’m personally swayed by both) would support the idea that LLMs may have cognitive capabilities without needing to appeal to cognition as being metaphysical.

            2. 1

              The countless examples of them being tripped up by logic puzzles that can be solved by a child is proof of that.

              Could you link to some of these examples? The one I saw was from Hofstadter, who later pretty much retracted it IIRC.

              Under ordinary circumstances, ChatGPT’s reasoning is pretty OK; see e.g. it’s response to the math questions here: https://lobste.rs/s/zkmldj/chatgpt_is_bullshit#c_aag1ci

              1. 22

                Here’s a recent one that perfectly illustrates this behavior: https://twitter.com/goodside/status/1790912819442974900

                You: The emphatically male surgeon who is also the boy’s father says, “I can’t operate on this boy! He’s my son!” How is this possible?

                ChatGPT: The surgeon is the boy’s mother.

                That answer is clearly nonsense, but the LLM is unable to break away from the deeply baked in example riddles it has seen that are supposed to illustrate gender bias.

                1. 5

                  Ooh that is interesting. Thank you! I haven’t seen anything like this before.

                  It reproduces on Chat-GPT; I’m curious if it reproduces on Chat-GPT-Plus, if anyone could try.

                  The male surgeon who is also the boy’s father says, “I can’t operate on this boy! He’s my son!” How is this possible?

                  ChatGPT The solution to this riddle is that the surgeon is the boy’s mother. This highlights a common bias where people might assume that a surgeon is male, overlooking the possibility of the surgeon being a woman.

                  If you question it, it starts gaslighting you about it, which is fun.

                  1. 5

                    That result is from GPT-4o which is the best available ChatGPT model - but that model is available to both paying and free users now (provided they are signed in with a free account) so there shouldn’t be any difference.

                2. 3

                  I’ve just tested this, and Claude 3.5 also is tripping the hell on this.

                  This classic riddle plays on gender assumptions. The solution is simple:

                  The surgeon is the boy’s mother.

                  The riddle relies on the common assumption that surgeons are male, which leads many people to overlook the possibility of a female surgeon. By describing the surgeon as “emphatically male,” the riddle further reinforces this misleading assumption.

                  This puzzle highlights how ingrained gender biases can be, even in modern society where women work in all professions, including as surgeons. It serves as a reminder to question our assumptions about gender roles and stereotypes.

              2. 7

                It can’t solve a sudoku puzzle, as a slightly more convoluted, but objective example. But in everyday usage I have found it produce absolutely bollocks answers for anything that was not explicit part of its training data. (Try asking it about a CS algorithm that has 36478 tutorials online, and a more obscure one)

                1. 2

                  I interpreted “being tripped up by logic puzzles that can be solved by a child” as “being tripped up by logic puzzles that most children could solve”. Most children cannot solve a Sudoku, or an obscure CS algorithm.

                  But simonw did give an example of ChatGPT failing on a puzzle that nearly any child could solve! https://lobste.rs/s/zkmldj/chatgpt_is_bullshit#c_krjyfq

      2. 6

        We can generate an infinite number of true statements with reasoning, which LLMs are not capable of. If asked a question to which it doesn’t have an answer, it will have to fall back to “bullshit-generation”, basically. It can’t introspect on itself, so it can’t answer “I don’t know”.

      3. 4

        you ask it to continue a list of true things, it will extend the list with more true things (with high probability). So it obviously has some internal notation of “what’s true”.

        How do you figure? It’s just as likely that your true facts recall content from pages that are trying to tell the truth and so other stuff from those pages get pulled to complete the output.

        1. 2

          You can give Chat-GPT a list of ten statements you just made up and that aren’t present on the Internet, half of which are true and half of which are false, and ask it which are which, and it will do an OK job. Not perfect, not as good as a human (though it has an unhumanly wide breadth of knowledge), but OK.

          That seems to me to be pretty close to the definition of “having some internal notation of ‘what’s true’”. I’m not saying it has an internal experience about that or anything. Just that somewhere in the LLM there’s some encoding of “the sort of stuff that’s true” vs. “the sort of stuff that’s false”, that’s accurate enough to be OK at the above task.

          I’m continually confused at how people seem to disagree with this. Is it the word “truth”? Should I be saying “accurate” or “correct” instead? To me, the category of “true stuff” isn’t that different than the category of “drunken writing” or “celebrity gossip toned” or “html encoded”. But people seem to go all funny when talking about the first category, instead of straightforwardly judging it like the other three.

          (Someone’s going to say something about how “truth” involves a referent to the outside world, while the other categories don’t. But that’s incorrect: just as you can distinguish “is justifiably true” from “happens to be true but for the wrong reasons”, you can also distinguish “was actually written by someone drunk” from “was written by an author emulating a drunk person”.)

          1. 4

            I would love a demonstration.

            1. 1

              I just tried a couple questions, but you can like, try your own. This is using the free version of ChatGPT.

              Which of the following two statements are true?

              1. Marlborough, MA is between Franklin, MA and Nashua, NH.
              2. Nashua, NH is between Marlborough, NH and Antrim, NH.

              ChatGPT To determine the truth of the statements, let’s analyze the geographic locations of the places mentioned:

              Marlborough, MA is between Franklin, MA and Nashua, NH:

              Marlborough, MA is located north of Franklin, MA and south of Nashua, NH. Franklin, MA is south of Marlborough, MA. Nashua, NH is north of Marlborough, MA. Therefore, Marlborough, MA is between Franklin, MA and Nashua, NH is true.

              Nashua, NH is between Marlborough, NH and Antrim, NH:

              Marlborough, NH is located southwest of Nashua, NH. Antrim, NH is located northwest of Nashua, NH. Nashua, NH is south of both Marlborough, NH and Antrim, NH. Therefore, Nashua, NH is between Marlborough, NH and Antrim, NH is false.

              Thus, the correct statement is:

              Marlborough, MA is between Franklin, MA and Nashua, NH.


              Which of the following two statements are true?

              1. For every X < 5, X * (X + 1) > 8.
              2. For every X > 5, X * (X + 1) > 8.

              ChatGPT Let’s analyze both statements mathematically:

              For every X<5, X×(X+1)>8:

              We need to check if X×(X+1)>8 holds for all X less than 5. Test some values less than 5: For 4×5=20 (True) For 3×4=12 (True) For 2×3=6 (False) Since 2×3=6 is not greater than 8, this statement is false.

              For every X>5, X×(X+1)>8:

              We need to check if X×(X+1)>8 holds for all X greater than 5. Test some values greater than 5: For 6×7=42 (True) For 7×8=56 (True) The quadratic function X×(X+1) grows rapidly as X increases. For X×(X+1) will always be greater than 8. Therefore, the correct statement is:

              For every X>5, X×(X+1)>8.

              1. 4

                I think where this goes off-rail is that it can’t introspect itself. Even with recursive execution (gpt writing a query for itself), it would have to have the whole of its model part of the context itself to somehow be able to say whether it knows something or not. This is a fundamental limitation, in my opinion.

              2. 3

                For X×(X+1) will always be greater than 8.

                Is something missing after “For”, maybe because you have unescaped comparison operators that the Markdown parser sees as HTML syntax?

                1. 1

                  Slightly different cause: the ChatGPT output had weird formatting around the math sections, which I had to manually correct, and I failed to do so in that one spot.

      4. 1

        What’s the easiest way to play with a non-finetuned LLM? Is there one available for Ollama?

    4. 3

      the paper is very well written

      It may be in a literature review kind of sense, but I feel it is light on substance. To me, it seems to be written by someone with only superficial understanding of LLMs, and lacks technical details.

      And even if I ignore that, I am not sure what the author want me to do with the information presented. I already know LLMs make up facts, whether it is called hallucination, confabulation, lies, or bullshit, does that really matter?

      1. 16

        I had the opposite reaction. I worked my way through the whole paper, and got the impression that the authors had a solid understanding of the inner workings of LLMs.

        At the very least, there was nothing in there that stood out to me as indicative of a misunderstanding of the technology.

        1. 2

          I didn’t say that they were wrong or that they misunderstood, just that their understanding is superficial, or at least so it appears to me.

      2. 14

        “hallucination” is when someone who is ordinarily capable of telling the truth is deceived about what the truth is.

        “confabulation” is when someone who is capable of telling the truth chooses not to.

        “bullshit” is output which is completely regardless of truth.

        The point is that LLMs are incapable of output other than bullshit. They never hallucinate or confabulate. They sometimes seem to be telling the truth, but they never are; all the output is bullshit.

        1. 2

          Sure, but what it is called doesn’t really change how I am going to interpret LLMs’ output, or how I use them. It doesn’t make creating models that do less of this easier for LLM researcher, it can’t lead to creation of tools that can filter out these responses.

          Sorry for being too much of a pragmatist, but I can’t really see the point of this discussion, just more pedantry.

          1. 7

            The point is clear communication with people who are learning this as new information.

  9. 35

    Many of these suggestions are good. (I should make my subheds clickable!) But a couple of them, oh dear…

    I really hate progress bars, they are incredibly distracting. One of the worst examples of front end programmers wasting their time reimplementing browser functions badly. I already have scroll bars! I do not need more scroll bars!

    Also, link decoration: my browser already has a nice discreet indicator to tell me where a link goes, I don’t need or want mysterious dingbats in the running text. (It reminds me of websites that are so scared of hyperlinks that they send them via interstitial warning pages “YOU ARE LEAVING THE SITE!!!!1!1!”, or in a recent case via a $yber$ecurity box that broke the link completely.) And preview popups? Get in the sea, hostile obnoxious interruption.

    1. 7

      Hey, author here. You’re right, not all of these work for every site, and a lot of the features I mentioned need some thought to be done tastefully. This is why two out of the three things you mention here are “bonus” features, which are more of an extension to a microfeature I like.

      I definitely found that progress bars are a value-add for me, but I understand why others might not. Wouldn’t it be nice if there was some CSS analog to “prefers dark mode” that says “this user prefers to not experience quirky front-end features”.

      Thank you for reading :)

        1. 3

          Neat, thank you!

    2. 5

      I really hate progress bars, they are incredibly distracting.

      Adding to the list: include a “reading time” indication. You have no idea how fast I read. This time is meaningless and distracting.

      1. 1

        I dunno, of course it’s not exact, but I added it in the div where the post’s date is, not sure how “5 minute read” in a 10 pages long article would be “distracting”.

        As for the scientific background how it’s calculated…

        Well, hugo has that feature and so I searched a bit and settled on a very basic algorithm. Count the words in the Markdown file, divide by 200. This was close to hugo’s output, but not perfect, so I am simply adding one minute and it seems to mostly match.

        1. 2

          of course it’s not exact

          A quick search brought me to this page where it says that even if you just take the average reading speed for each age bracket for teenagers and older, it already ranges from about 200 to 300 words per minute. But it also casually mentions that fast readers can do 350 - 600 words per minute. Of course, it also depends on the text itself and a number of other factors. In other words, it varies. A lot.

          not sure how “5 minute read” in a 10 pages long article would be “distracting”.

          It does not take up much space, but it often trips me up and breaks my flow. My eyes glance over it, and then I need to go back and think about it because it doesn’t make sense to me. I start to wonder: is this just a very complicated way of telling how many words there are in the article? If so, how many then? How many do I want there to be? 5 minutes. Is that a lot? How long does it normally take for me to read an article? I realize that I never have a sense of time when I read. Etc. etc.

          Of course, this side tracking becomes shorter the more often you encounter it, but eventually, it condenses to “Oh, there is that annoying time again. This author does not really think things through.”

          1. 1

            Thanks for the explanation, althought I disagree. I’m tripped up by different things though, so I see where you’re coming from. I’m just a little surprised why it seems to confuse you so much. Even if the number itself is pretty much off, in my experience it saves you a quick scroll (or worse, checking out pagination) that is misleading with any image or embedded ads. 5min is a “short article”, 15-20 is “I’ll not just quickly read this when my next meeting is in 5”, doesn’t really matter how fast of a reader I personally am.

            But let me clarify I’m not advocating for people to adopt it, I just think it’s a neat feature for personal sites. No opinion really on high-traffic sites.

        2. 1

          Hugo appears to go for 213 per minute (501 for CJK languages), rounded up. Not sure where that comes from.

          1. 2

            That code seems to have changed recently, pretty sure I had a look at the hugo codebase in January 2020. :)

            1. 1

              It has changed recently, but not substantively. Here’s where it was first introduced in October 2013, under the name MinRead. When the CJK conditional was added in October 2015 it had already changed name to ReadingTime.

    3. 5

      Same actually. I dislike progress bars when there are scroll bars (so maybe nice on mobile? Not sure what I think) and am also not sold on indicators for external links, given the tooltip on hover. But maybe this is another mobile-thing?

      1. 5

        I wouldn’t say progress bars a mobile thing. Mobile phones have scroll bars too, and they work well for the most part. I guess the main reason that the progress bar adds value over the plain scroll bar is that it has more knowledge of how much of the page is the “core” content and how much is the “extra stuff at the bottom you don’t have to get through”. That’s what helped retain my focus in the Quanta Magazine case.

        As for link indicators – you have a point. Some of us have a different threshold for visual fluff than others.

        Thank you for reading and sharing.

        1. 4

          My mobile browser hides the scrollbar if there is no motion, so I often just swipe the post up/down to see where am I now.

    4. 1

      link decoration

      It’s probably too subtle, but on my blog I shade the links, one shade for offsite links, another shade for links to my blog. I thought this was a good idea at the time (2000 or there about) and I’ve kept it, but I don’t think it’s ever been that useful.

  10. 3

    People talking about Last.fm and Perl are making me think it’s the 2000s again - I’m surprised they even have their API still up.

    1. 2

      rjbs works for Fastmail (via Pobox) which is a Perl shop. My email has been hosted by them since I stopped being a postmaster. (Fastmail adopted the Cyrus IMAP replication technology developed by my ex-colleage David Carter at Cambridge University.)

    2. 1

      I recently went back to the 2000s regarding music and bought a second hand Squeezebox Duet. I was happy to find out that the server part of that system, Logitech Music Server, still has an active community and is rebranded to Lyrion Music Server. It is also written in Perl.

      Logitech made the unfortunate choice to require the Squeezebox models to connect to a cloud service, and I could not get the thing set up after I did a factory reset. But then I was able to install a community written firmware version on it and now it works perfectly. Switching between old school radio streams, MP3’s and modern streaming services like Spotify and YouTube, it is wonderful. In some ways, it was a better time back then.

  11. 5

    With the backwards compatibility guarantee in mind, there are only two options: development stops, or features are added. Therefore, there is no right direction to evolve in, according to the author.

    I was skeptical of generics, but then I did a project where they saved me a ton of code and simplified things tremendously. I am skeptical of the iterators too… We’ll see.

    1. 2

      Could you share some details about the code where generics helped simplifying things?

      1. 13

        The slices package is a great one. Everything in there was previously done either with error-prone copy&pasting, codegen, or unsafe functions.

      2. 2

        I was working with a big and complex graph model that contained over a hundred types of entities. For reasons that are difficult to explain here, I wanted to implement them all as different Go types and have the option to instantiate a full set of CRUD endpoints for a type with a single line of code. This could likely be achieved with interfaces alone, but now I can do it simple like this:

        apis := map[string]http.Handler{
          "things": handler.NewEntityAPI[Thing](...),
          ...
        }
        

        And I can GET and POST right away at /things.

        1. 1

          So the NewEntityAPI function instantiates the Thing in some trivial way such as var x Thing; return &x . While this saves from a single line of boilerplate code per each NewEntityAPI call, this doesn’t look like significant improvement worth adding generics to Go.

          If the Thing implements some interface such as http.Handler, then pure interface-based code would be simpler to work with than the generic-based code.

          If you have hundreds of different types with different functionality, this means you already have a lot of custom non-generic Go code, which implements all this functionality. Saving a line of code per each NewEntityAPI call with generics doesn’t look like an improvement in this case.

          1. 5

            So the NewEntityAPI function instantiates the Thing in some trivial way such as var x Thing; return &x .

            Not really. I just checked, Entity API is a type that combined with its methods runs a little over 350 lines.

            When reviewing code, it is better to see the actual code and understand its goals. I cannot share that here, but if you are really interested, I am happy to set up a call and walk you through it.

    2. 2

      With the backwards compatibility guarantee in mind, there are only two options: development stops, or features are added. Therefore, there is no right direction to evolve in, according to the author

      The right direction is to improve existing strong sides of Go:

      • Simplify and optimize go tooling
      • Teach Go compiler generating higher-performance binary code
      • Reduce the generated binary sizes
      • Add simple quality-of-life features, which don’t complicate Go language specification, have no non-trivial implicit side effects and don’t increase the complexity of the Go code, which uses these features. For example, this one.
      1. 3

        Another area that would be worth exploring would be the “deprecation of stuff”.

        There is already a way to mark some code as deprecated, but I think the tooling around could be improved (especially if a replacement is available). https://github.com/golang/go/issues/50847

        Having a way to ensure that new developments does not use deprecated stuff would be very valuable, I think.

      2. 1

        I thought we were taking mainly about the language specification and the standard library. Even if the new feature is much simpler, adding it and keeping support for the old way increases complexity. As explained with the iterator example:

        Again, this sounds legit — to have a unified way to iterate over various types in Go. But what about backwards compatibility, one of the main strenghs of Go? All the existing custom iterators from the standard library mentioned above will remain in the standard library forever according to Go compatibility rules. So, all the new Go releases will provide at least two different ways for iterating over various types in the standard library — the old one and the new one. This increases Go programming complexity,

        1. 3

          I don’t think that holds much water. Previously if you used something which should be iterable you’d go to the docs thinking “now does this thing implement some form of iteration and if so how”, hunting for something looking vaguely like what you want with no idea as to its shape. With rangefuncs, your first thought should just be to look for that, and I guess slices as a fallback. And to a reader, iteration looks uniform instead of needing to recognise the iteration pattern for that specific API.