I used to be a Gentoo user in its early days. It’s great. But it also sparked a lot of cool technologies, including things like probably the best init system for Linux, OpenRC. They did really cool stuff, they used to have (still do?) one of the biggest package collections. Speaking about quality packages, not AUR - which I love, but is… well it’s what you get when everyone can just add a package.
Another strong suit used to be portability. I had a time when I loved getting OSS to run on exotic hardware and such and Gentoo did beat Debian on for example PA-RISC.
Later I was surprised to find that at university people ran Gentoo images at the university’s computers at a large scale. Gentoo felt a bit niche before that.
They also had a time when a lot of leading edge security stuff happened there, because it was easy to just build things a certain way and get a patched kernel.
While I haven’t used Gentoo in two decades or so, I think there is many “small” (in popularity, not in work or impact) things that really advanced Linux and the whole ecosystem, at least in inspiration and prior art.
Check it out again :-) they’ve done a lot of work to improve the experience, including a binary package host for faster builds. I run it on my VPS and love it.
After reading, I still don’t think I fully understand what Antithesis is. I think the bit that’s getting me is how something could be completely reproducible unless you ran it on actual hardware.
My guess is that making a deterministic hypervisor involves a lot of kernel hacking, and BSD source code is much smaller, coherent, and maintainable by fewer people than Linux
Linux is like a big sprawling thing that uses every trick in the book, with code from thousands of people, and hundreds of different companies
Also I recall hearing specifically that the hypervisor support in Linux is a nightmare. This might have been from some talks on Solaris/Illumos (e.g. from Bryan Cantrill), which shares some lineage with BSDs
(I am not a kernel hacker, nor do I have any first hand knowledge, so take this with a grain of salt. However I do know people who have worked on high perf networking in the kernel, and say there is a stark difference between BSDs and Linux)
I suspect this is the main reason, another one may be the license since bhyve and FreeBSD are licensed under the BSD license, which is more permissive than Linux’ GPLv2.
I sent an email to Antithesis and received a reply from their CEO:
We decided to start with bhyve because it’s a very simple hypervisor with very few features, so much easier to modify with the deep changes we needed to make in order to make it deterministic and have copy-on-write snapshotting (we wrote about some of those efforts here: https://antithesis.com/blog/deterministic_hypervisor/).
The alternative we considered was to write our own hypervisor completely from scratch. We looked at kvm, but decided its architecture was too complex to allow deep modification.
Yes, I regularly remove stories that seek customer service via public shaming. I don’t want Lobsters used to whip up a mob in an outrage and direct them at targets. It always feels righteous at first and becomes an awful tool for abuse.
I don’t think it’s right to call this a post seeking customer service. It’s a post calling attention to a policy change made by Automattic that immediately affected the privacy of all Tumblr users, and all Wordpress.com users. Users of those platforms have to revoke the consent that Automattic assumed it had - wrongly, in the view of OP. Is that outwith the scope of Lobsters?
tumblr & WordPress posts were already being scraped by AI companies, just like the rest of the public Internet, and users had no control over it.
Automattic just gave control to users, allowing them to increase privacy. The opt-in nature means there is no change in behavior unless the user chooses to take action, which seems reasonable to me.
It is explicitly opt-out, not opt-in. The original post rightly flagged this as a problem of the platform assuming user consent.
Content scraping is already happening, yes, but as SoapDog said below, Automattic is directly profiting from this sale of data, and that data originally dumped by Tumblr included private and deleted posts, not just public ones. It’s unclear whether this was given to OpenAI.
As far as I am aware, it is “opt out” and not “opt in” and that is quite different. Also, the posts being scrapped by AI companies harvesting the Web is already a problem, but then the problem is with the AI companies. Automattic PROFITTING from users content by packaging it and selling it without consent is even worse and makes it their problem.
The opt-in nature means there is no change in behavior
What opt-in nature are you referring to? The post is complaining about Automattic creating a new default: selling your data directly to AI companies, without compensating you, unless you explicitly opt out. Sure, for most sites, AI scrapers could do that anyway just by ignoring your robots.txt, as always. But there doesn’t seem to be any opt-in facet to this change that I can see.
There is no change in behavior by default: your data was scraped by AI companies before, and it still is. The change is that you are now allowed to opt out. That sounds positive to me.
Before, only your public data was being scraped by AI companies. I believe that public data will continue to be scraped by AI companies regardless of whether you use automattic’s opt-out mechanism. It’s public, after all.
Now, automattic is offering those AI companies more of your data. And they will share that by default unless you opt out.
If all of the data you share with automattic is public, it seems like no change to me.
If some of the data you share with automattic is not public, it sounds like a significant downgrade to need to opt out.
Grabbed the link from the moderation queue to see what the fuss was about.
The post was not asking for customer service, but instead the change in said service was the catalyst for discussing a broader issue in Customer-Generated content and policy surrounding it.
For those wondering, a summary of the article: “Opt-out isn’t a good model when it comes to handling scrapers and similar, and in continuing to legitimize this behavior, companies that engage in this are eroding the discussion around the consent of a company’s handling of customer data. Automattic has decided to engage in this behavior, and I happen to pay Automattic to host my blog, but the issue is far greater and I felt the need to speak on it.”
This is voicing displeasure with a policy change and UX dark-patterns that enable technical actions which are not always in the better interests of users/customers/etc. This is not “uwu automattic locked me out of my tumblr for women offering plumbing supplies they suck go beat on their door”, it’s a discussion on “just how much leeway does a company have on the data that they store for a customer and when is it better to ask permission rather than require explicit disapproval?”
Why might companies make a policy change like this?
What engineering can we do to better allow for consent in a profit-seeking context?
Is there a fundamental mismatch between what users say they want, what users actually use, and what a reasonable implementation of consent looks like?
What are past examples, going back years, of violating (or manufacturing, to use an old phrase) consent by these companies? Is it truly just a recent thing?
The blogpost doesn’t really do any of those things, and certainly not at a level beyond the most simple, knee-jerk, and facile.
I am unaware of lobste.rs enforcing such criteria in the past or having a general rule against “screeds”. There’s even a rant tag. And plenty of “simple, knee-jerk and facile” posts show up here and don’t get removed.
So I think you will need to find a better argument against the post in question.
My argument there is not with the submission (though I complain about that elsewhere), but with the comment I’m replying to: the comment is claiming that the post is a discussion, I claim the post is merely a screed–and not a particularly good one at that–and give some examples of what would elevate it.
By the way, the rant tag also has some effects that hint that it isn’t the preferred content here:
(from rant description) Hotness modifier -0.25 (lowers a story’s rank). Tag is not permitted on stories submitted by new users.
Some good rants make it through of course.
And plenty of “simple, knee-jerk and facile” posts show up here and don’t get removed.
The article in question seems very relevant and educational to the Lobste.rs community given the tech industry’s poor understanding of consent and widespread abusive practices around user data.
As far as I know, based on logic and personal experience, the tech industry understands consent quite well. Decisions to make things opt-out are not accidental. It’s a business decision made in consultation with legal, based on a desired outcome. There’s no technical question, and I doubt this is a surprising situation to anyone involved in implementing an opt-out like this.
IMO you provide an excellent example. You are confusing “do I have consent?” with “is it legal?”.
Their users consented to use of their data for providing blogging services. Selling that data to a 3rd party for other, unrelated reasons is unethical because the user did not know about this possible use when they signed up for the service. It’s unethical to automatically opt them in.
You are confusing “do I have consent?” with “is it legal?”.
I don’t think he was. From what I’m reading, he was politely saying that those tech businesses deliberately ignore consent, and instead just look at money and law. Cynicism, not incompetence.
Now if I were asked to implement that kind of opt-out dark pattern, I would definitely consider answering “sorry, find someone else”. I could afford it right now.
I totally get that. I trust that is exactly the discussion that business had with legal (and corporate comms and marketing and government affairs). And the only technical option you have in this situation is to find another job. So while this is a completely valid and necessary topic, it doesn’t seem (to me) like Lobsters is the right place for it.
I agree. There’s no technical difficulty in making something “opt-in” instead of “opt-out”, and a discussion of why opt-in is better isn’t technical and isn’t going to make me a better programmer.
If, hypothetically, some open-source project announced tomorrow – Thursday – that as the Next Chapter of their Exciting Open Source Journey they’re switching to BSL or another “source available” license in order to better monetize the project, and
If, hypothetically, I were to write and publish a blog post the following day – Friday – talking about the philosophy and ethics of Free Software and Open Source and condemning such switches, and calling on people to vote with their wallets by ceasing use of such a project, then
Would you remove that post from lobste.rs?
I ask because so far as I can tell, such a post would not differ in form or aims from the removed post under discussion, yet posts which couch “customer service complaints” or “business news” in even the thinnest possible framing of being about FLOSS licensing and licensing ethics don’t seem to get removed the way this one did. Heck, sometimes just the pure “business news” of a license change announcement is left up as apparently on-topic.
And to register a personal opinion, I think the post being discussed here was more on-topic for lobste.rs – if viewed through the lens of “pertains to computing” – than licensing slapfight threads typically are. I also think the post being discussed here was on-topic for lobste.rs and should not have been removed.
You’re right that your article isn’t seeking customer service, but I do think that the second part of @pushcx’ comment - “I don’t want Lobsters used to whip up a mob in an outrage and direct them at targets” - is a valid choice. It’s not the choice you made for your (excellent!) blog, but I don’t think @pushcx is making an invalid choice for Lobsters.
I’d already read your article via Mastodon, and liked it - and while I agree your article is not a request for customer service, I do think one could reasonably call it advocacy.
I do hope that you’ll continue to find value in Lobsters regardless of the outcome of this thread; I’ve enjoyed reading your comments here (as well as your blog), and I wish you the very best.
I am curious to know what you think of my analogy to posts about companies doing license changes, which to me are largely indistinguishable in form from the removed post under discussion here, but somehow are still allowed (despite being “business news” and not “pertaining to computing” and often being used to “whip up a mob” and “direct them at targets”).
It might be relevant that changes to blogging platforms affect authors more broadly, while license changes affect developers in particular. Lobsters caters to both to a degree, but more to the latter than the former.
Automattic is a store of data, they also have the details for those that are affected by the decisions done (and directly inform the users). All users are implicitly at the mercy of any changes in ToS, and must respond / care / etc within a reasonable amount of time.
A codebase’s only transaction with users is when those users acquire the code, at that time they can check the license and decide how they feel about it. They only need to check the license when acquiring the code, no other time. There is no mechanism to convey this information to the users otherwise. And it does not apply within a reasonable amount of time either.
There’s also a privacy tag here for use and abuse of one’s data – not even necessarily one’s confidential data. See, for example, threads about people leaving GitHub to avoid having their code used for things like Copilot.
So I still don’t see a meaningful difference between the post being discussed here, and many things which have gone un-removed in the past.
(First: I really appreciated your post. I was a bit out of the loop because of other things eating my attention lately, and your post did a great job of both catching me up on things I’d seen on fedi but hadn’t carefully read yet, and contextualizing the underlying consent issues. Thank you for writing that.)
It clearly wasn’t seeking customer service, but if the mod message had said “lobste.rs is not your torch and pitchfork outlet” instead, I’m not sure I’d have batted an eye. And the two messages are really mostly equivalent, IMO.
I have no power here; I simply like this place and enjoy many of the discussions that can be had here. But I don’t really feel that this site is a good place to discuss your (IMO excellent) post. It could draw a good discussion here, but it could also (more likely, IMO) draw a really terrible one. The only reason I wish it got left up on the page is because I really feel the points you made need a signal boost in the industry.
But I’m pretty sure that I’d have needed to hit “hide” on the comments to avoid being drawn into a flame war.
Thanks for all you write. I always learn something when I read it.
While I agree with the moderation decision, I was wondering if you would be open to rewording the mod messages in a more compassionate manner? I think they are a little abrasive, and the wording might be the reason why folks are getting upset.
Additionally, while all of the moderation actions are transparent, I think the guidelines for posting are difficult to find. They are buried under “tags and topicality” on the About page, mixed in with information about how the tagging and ranking system works. The orange site has a clear set of guidelines that one can find linked on the bottom of the site.
Thanks, this is all really good points. The About page started as a post about technical features and it really wasn’t clear what was happening in that section after years of edits. I’ve lifted the topicality info up to a top-level section titled Guidelines and expanded it with sections on the site climate (where I’ve tried to capture the site’s vibe in positive terms rather than a list of “do not”s), this topic of brigading, and self-promo. I took this language from the mod log, hatted comments/DMs, and meta threads, and I’ll need to do a comprehensive review of those at some point to flesh things out. I hope folks will suggest things I’ve missed or could’ve explained better; I’m particularly not satisfied that I had to handwave a bit about where to draw a line on brigading and would like to do better than this slightly “know it when I see it”.
I’ll try to echo this less frustrated language in future mod messages, or otherwise make those clearer and more actionable. Thanks for the criticism.
I’m not sure if this is still the case, but one of the dangers with AVX-512 instructions is that they can cause performance degradations due to how much power they use. It would be interesting to see how this optimization held up outside of the realm of micro-benchmarks.
Now, this adjustment of expectations comes with an important caveat: license-based downclocking is only one source of downclocking. It is also possible to hit power, thermal or current limits. Some configurations may only be able to run wide SIMD instructions on all cores for a short period of time before exceeding running power limits. In my case, the $250 laptop I’m testing this on has extremely poor cooling and rather than power limits I hit thermal limits (100°C limit) within a few seconds running anything heavy on all cores.
That’s good to know that there’s not the automatic downclocking like in Haswell/Skylake. Thanks for sharing!
The other limits would still be relevant especially in virtualized / containerized / other multi-tenant workloads, wouldn’t they?
(NGinx, Apache, LigHTTPd, HAProxy, and other HTTP servers and balancers, are all capable of either directly running CGI or via FastCGI; but these are too heavy-weight, thus outside of my scope;)
civetweb (https://github.com/civetweb/civetweb) – I’ve just stumbled-upon this one today while looking at the options available in BuildRoot; looks promising!
When I say “heavy-weight” it means something on the lines “when considering many aspects of the software solution, the outcome is on the heavy side”; for example (and here I speak about general purpose HTTP servers like NGINX / Apache / etc.):
first of all most have including-the-kitchen-sink built-in (TLS, ACME, caching, compression, etc.), which translates not only in a large documentation, but also in a hard to tame beast; (does it come with caching pre-enabled? should I disable it explicitly? how about compression? how about TLS? how about AIO?)
then, all this complexity also translates into larger attack-surfaces; (now the attacker has a large set of potential code-bases to try and exploit; perhaps a vulnerability in OpenSSL, perhaps one in the compression library, etc.);
furthermore, all this complexity also translates into deep dependency chains (that make the final embedded image larger);
However, perhaps my main issue with these general purpose servers is that they are tailored for “normal” servers (with lots of RAM and CPU, at least relative to small embedded systems), and if you want them to run well on embedded devices you’ll have to tweak a lot of knobs.
Thus, my quest for a HTTP server geared towards small / embedded deployments: I can assume that the developers have already chosen defaults that are well suited for such environments.
first of all most have including-the-kitchen-sink built-in
… I think this might be a valid criticism of Nginx, due to its compiled-in nature for “plugins”, but with apache, almost everything is provided by loaded modules. Don’t load the modules, and you don’t get the complexity/attack surface/dependencies.
A really nice error handling pattern in C++ is using something like absl::StatusOr or std::expected. I like this a lot better than exceptions since the control flow is a lot more explicit.
The third mode, with both strict ordering and guaranteed delivery is obviously the safest. It’s also, in my experience, the most common.
In my experience, it’s very rare that you actually need strict ordering in your message queue. For instance, the delivery notifications example in the article is not strictly ordered end-to-end. Once your notifications are sent to the user, there is no guarantee that they will arrive in the same order (or on time, or even at all!). In that case, why maintain a strict ordering and the scaling limitations that come with it?
A global total order on messages (and guaranteed delivery in that order) can simplify reasoning about a system. For example, it can be useful to know that if you’ve seen a given message, you’ve already seen all previous messages (WRT the total order). Replication can also be easier with this guarantee: you can quantify how “far behind” one replica is WRT another, and if 2 replicas have the same “last seen message ID”, you know they’re identical.
Lastly, but importantly, code is required. You need to take your great ideas and make sure that they solve the problems you think they solve.
I think this is a super important point that seems a little glossed over in the article. In my experience, much of the time, you don’t see certain unknowns when you’re planning a solution for a problem. In some cases, it may even be worth it to write some exploratory code first to explore the problem space before codifying possible solutions.
Kudos for doing the hard task of taking a stand. This writing resonates with me because I’m also trying to “downgrade” my tech assets into manageable solutions, as opposed to letting Google/Apple handle everything for me. I’ve been trying to get started on making my own NextCloud setup at home for myself and others, and been recently looking into things like pmOS for some of my own devices.
Good luck in the journey 🎉 (also there’s a typo in the word “als”).
More power to you. I worry the next generation of kids will know nothing but FAANG, since many (if not all) public schools utilize Google Classroom, post updates on IG and FB, and run Gmail enterprise mail. This is US-specific though, and I am not aware of how it is in EU and Asia.
My generation knew nothing but Big Tech Software™ as well, since many (if not all) public schools utilized Blackboard (where they also posted updates), and ran Microsoft Outlook enterprise mail. Same stuff, different generation. I think we turned out OK.
Agreed that the jump from Blackboard to Microsoft mail to Gmail is not that hard, maybe I did not elaborate my point properly. My bigger fear is Big Tech Software will weaponize the information they have on my kid in the future. I may be too paranoid, but I think Big Tech has enough information on my kid and her behavior and thought process to be able to subtly manipulate those when she turns an adult. :-(
Consider instead the ironic joke that has circulated a lot in recent years about how it’s actually been the Baby Boom/Gen X parents of today’s young people who used to say “don’t believe everything you see online!” and then… fell into believing everything they saw online, and so have been manipulated into consuming, like addicts, an ever-worsening spiral of outrage content pushed on them by algorithms which optimized for that.
The younger generations have grown up aware of the fact that social media tries to lead them down that path. That alone makes a significant difference in how it affects them.
I sincerely hope that is the case, though I am not convinced the awareness is where you say it is based on my personal experience (but again it is more of an opinion than backed by solid data!) :-)
Agreed, I think the next generation is going to suffer a lot because of these dependencies. Because of how deeply rooted Windows/Android/iOS is in the consumer market, businesses are not going to be keen on changing the status quo. The only way to take back control is to start encouraging others and sharing progress or experiences, which I’ll keep doing and always like to read about from others.
IMHO it’s hard to get much out of reading a codebase without necessity. Without a reason why, you won’t do it, or you won’t get much out of it without knowing what to look for.
Yeah, this seems a bit like asking “What’s your favorite math problem?”
I dunno. Always liked 7+7=14 since I was a kid.
Codebases exist to do things. You read a codebase because you want to modify what that is or fix it because it’s not doing the thing its supposed to. Ideally, my favorite codebase is the one I get value out of constantly but never have to look at. CPU microcode, maybe?
I often find myself reading codebases when looking for examples for using a library I am working with, or to understand how you are supposed to interact with some protocol. Open source codebases can help a lot there. It’s not so much 7 + 7 = 14, but rather 7 + x + y = 23, and I don’t know how to do x or y to get 23, but there are a few common components between the math problems. Maybe one solution can help me understand another?
when I am solving a similar problem or I’m interested in a class of problems, sometimes I find reviewing a codebase very informative. In my mind, what I’m doing is walking through the various things I might want to do and then reviewing the code structure to see how they’re doing it. It’s also bidirectional: A lot of times I see things in the structure and then wonder what sorts of behavior I might be missing.
I’m not saying don’t review any codebases at all. I’m simply pointing out that without context, there’s no qualifiers for one way of coding to be viewed as better or worse than any other. You take the context to your codebase review, whether explicitly or completely inside your mind.
There’s a place for context-free codebase reviews, of course. It’s usually in an academic setting. Everybody should walk through the GoF and functional data structures. You should have experience in a generic fashion working through a message loop or queuing system and writing a compiler. I did and still do, but in the same way I read up on what’s going on in mRNA vaccinations: familiarity. There exists these sorts of things that might help when I need them. I do not necessarily have to learn or remember them, but I have to be able to get them when I want. I know these coding details at a much lower level than I do biology, after all, I’m the guy who’s going to use and code them if I need them. But the real work is matching the problem context up (gradually, of course) with the various implementation systems you might want to use.
There are folks who are great problem-solvers that can’t code. That sucks. There are other folks who can code like the wind but are always putting some obscure yet clever chunk of stuff out and plugging it in somewhere. That also sucks. Good coders should be able to work on both sides of that technical line and move back and forth freely. I review codebases to review how that problem-solving line changed over the years of development, thinking to myself “Where did these guys do too much coding? Too little? Why are these classes or modules set up the way they are (in relation to the problem and maintaining code)?”
That’s the huge value you bring from reviewing codebases: more information on the story of developing inside of that domain. The rest of the coding stuff should be rote: I have a queue, I have a stack, etc. If I want to dive down to that level, start reviewing object interface strategy, perhaps, I’m still doing it inside of some context: I’m solving this problem and decided I need X, here’s a great example of X. Now, start reading and go back to reviewing what they’ve done against the problem you’re solving. Don’t be the guy who brings 4,000 lines of code to a 1 line problem. They might be great lines of code, but you’re working backwards.
Great Picard’s Theorem, obvs. I always imagined approaching an essential singularity and seeing all infinity unfold, like a fractal flower, endlessly repeated in every step.
I’d disagree. While sure, one could argue you just feed a computer what to do, you could make a similar statement about for example architecture, where (very simplified) you draw what workers should do and they do it.
Does that mean that architects don’t learn from the work of other architect? I really don’t think so.
But I also don’t think that “just reading” code or copying some “pattern” or “style” from others is what makes you like it. It’s more that if you write some code only on your own or with a somewhat static, like-minded team your mental constructs don’t really change, while different code bases can challenge your mental model or give you insights in a different mental/architectural model that someone else came up with.
For me that’s not so different from learning different programming languages - like really learning them, not just being able to figure out what it means or doing the same thing you did before with different syntax.
I am sure it’s not the same for everyone, and it surely depends on different learning styles, but I assume that most people commenting here don’t read code like the read a calculation and I’d never recommend people to just “read some code”. It doesn’t work, just like you won’t be a programmer after just reading a book on programming.
It can be a helpful way of reflecting on own programming, but very differently from most code-reviews (real ones, not some theoretical optimal code review).
Another thing, more psychological maybe is that I think everyone has seen bad code, and be it some old self-written code from some years ago. Sometimes it helps for motivation to come across the opposite by reading a nice code base to be able to visualize a goal. The closer it is to practical the better in my opinion. I am not so much a fan of examples or example apps, because they might not work in real world code bases, but that’s another topic.
I hope though that nobody feels like they need to read code, when they don’t feel like it and it gives them nothing. Minds work differently and forcing yourself to do something seems to often counter-act how much is actually learned.
Well, it varies. Many contributions end up being a grep away and only make you look at a tiny bit of the codebase. Small codebases can be easier to grasp, as can those with implementation overviews (e.g. ARCHITECTURE.md)
I have to agree with this; I’ve found the most improvement comes from contribution, and having my code critiqued by others. Maybe we can s/codebases to study/codebases to contribute to/?
Even if you don’t have to modify something, reading something out of a necessity to understand it makes it stick better (and more interesting) than just reading it for the sake of reading. That’s how I know more about PHP than most people want to know.
Years ago working on my MSc thesis I was working on a web app profiler. “How can I get the PHP interpreter to tell me every time it enters or exits a function in user code” led to likely a similar level of “I know more about the internals of PHP than I would like” :D
I was looking into this style of error handling last week. Currently the Outcome library looks like the best choice — it can be used standalone or with Boost, only requires C++14, and claims to be quite lightweight.
The upcoming exception refresh in C++2x is going to be similar to these in architecture, but integrated into the language syntax so it looks more like try/catch, and probably faster since the ABI will allow for optimizations like using a CPU flag to indicate whether the return value is a result or an error.
Is there a straightforward way to disable the problematic ALGs? I suppose it varies by what router you’re using. I have an Eero system; its firmware is up to date, but the release history doesn’t mention any fixes for something like this.
Hello, I am here to derail the Rust discussion before it gets started. The culprit behind sudo’s vast repertoire of vulnerabilities, and more broadly of bugs in general, is accountable almost entirely to one matter: its runaway complexity.
We have another tool which does something very similar to sudo which we can compare with: doas. The portable version clocks in at about 500 lines of code, its man pages are a combined 157 lines long, and it has had two CVEs (only one of which Rust would have prevented), or approximately one every 30 months.
sudo is about 120,000 lines of code (100x more), its had 140 CVEs, or about one every 2 months since the CVE database came into being 21 years ago. Its man pages are about 10,000 lines and include the following:
$ man sudoers | grep -C1 despair
The sudoers file grammar will be described below in Extended Backus-Naur
Form (EBNF). Don't despair if you are unfamiliar with EBNF; it is fairly
simple, and the definitions below are annotated.
If you want programs to be more secure, stable, and reliable, the key metric to address is complexity. Rewriting it in Rust is not the main concern.
Did you even look at that list? Most of those are not sudo vulnerabilities but issues in sudo configurations distros ship with. The actual list is more like 39, and a number of them are “disputed” and most are low-impact. I didn’t do a full detailed analysis of the issues, but the implication that it’s had “140 security problems” is simply false.
sudo is about 120,000 lines of code
More like 60k if you exclude the regress (tests) and lib directories, and 15k if you exclude the plugins (although the sudoers plugin is 40k lines, which most people use). Either way, it’s at least half of 120k.
Its man pages are about 10,000 lines and include the following:
12k, but this also includes various technical documentation (like the plugin API); the main documentation in sudoers(1) is 741 lines, and sudoers(5) is 3,255 lines. Well under half of 10,000.
We have another tool which does something very similar to sudo which we can compare with: doas.
Except that it only has 10% of the features, or less. This is good if you don’t use them, and bad if you do. But I already commented on this at HN so no need to repeat that here.
You’re right about these numbers being a back-of-the-napkin analysis. But even your more detailed analysis shows that the situation is much graver with sudo. I am going to include plugins, becuase if they ship, they’re a liability. And their docs, because they felt the need to write them. You can’t just shove the complexity you don’t use and/or like under the rug. Heartbleed brought the internet to its knees because of a vulnerability in a feature no one uses.
And yes, doas has 10% of the features by count - but it has 99% of the features by utility. If you need something in the 1%, what right do you have to shove it into my system? Go make your own tool! Your little feature which is incredibly useful to you is incredibly non-useful to everyone else, which means fewer eyes on it, and it’s a security liability to 99% of systems as such. Not every feature idea is meritous. Scope management is important.
what right do you have to shove it into my system?
Nobody is shoving anything into your system. The sudo maintainers have the right to decide to include features, and they’ve been exercising that right. You have the right to skip sudo and write your own - and you’ve been exercising that right too.
Go make your own tool!
You’re asking people to undergo the burden of forking or re-writing all of the common functionality of an existing tool just so they can add their one feature. This imposes a great cost on them. Meanwhile, including that code or feature into an existing tool imposes only a small (or much smaller) cost, if done correctly - the incremental cost of adding a new feature to an existing system.
The key phrase here is “if done correctly”. The consensus seems to be that sudo is suffering from poor engineering practices - few or no tests, including with the patch that (ostensibly) fixes this bug. If your software engineering practices are bad, then simpler programs will have fewer bugs only because there’s less code to have bugs in. This is not a virtue. Large, complex programs can be built to be (relatively) safe by employing tests, memory checkers, good design practices, good architecture (which also reduces accidental complexity) code reviews, and technologies that help mitigate errors (whether that be a memory-safe GC-less language like Rust or a memory-safe GC’ed language like Python). Most features can (and should) be partitioned off from the rest of the design, either through compile-time flags or runtime architecture, which prevents them from incurring security or performance penalties.
Software is meant to serve the needs of users. Users have varied use-cases. Distinct use-cases require more code to implement, and thereby incur complexity (although, depending on how good of an engineer one is, additional accidental complexity above the base essential complexity may be added). If you want to serve the majority of your users, you must incur some complexity. If you want to still serve them, then start by removing the accidental complexity. If you want to remove the essential complexity, then you are no longer serving your users.
The sudo project is probably designed to serve the needs of the vast majority of the Linux user-base, and it succeeds at that, for the most part. doas very intentionally does not serve the needs of the vast majority of the linux user-base. Don’t condemn a project for trying to serve more users than you are.
Heartbleed brought the internet to its knees because of a vulnerability in a feature no one uses.
Yes, but the difference is that these are features people actually use, which wasn’t the case with Heartleed. Like I mentioned, I think doas is great – I’ve been using it for years and never really used (or liked) sudo because I felt it was far too complex for my needs, before doas I just used su. But I can’t deny that for a lot of other people (mainly organisations, which is the biggest use-case for sudo in the first place) these features are actually useful.
Go make your own tool! Your little feature which is incredibly useful to you is incredibly non-useful to everyone else
A lot of these things aren’t “little” features, and many interact with other features. What if I want doas + 3 flags from sudo + LDAP + auditing? There are many combinations possible, and writing a separate tool for every one of them isn’t really realistic, and all of this also required maintenance and reliable consistent long-term maintainers are kind of rare.
Scope management is important.
Yes, I’m usually pretty explicit about which use cases I want to solve and which I don’t want to solve. But “solving all the use cases” is also a valid scope. Is this a trade-off? Sure. But everything here is.
The real problem isn’t so much sudo; but rather that sudo is the de-facto default in almost all Linux distros (often installed by default, too). Ideally, the default should be the simplest tool which solves most of the common use cases (i.e. doas), and people with more complex use cases can install sudo if they need it. I don’t know why there aren’t more distros using doas by default (probably just inertia?)
What if I want doas + 3 flags from sudo + LDAP + auditing?
Tough shit? I want a pony, and a tuba, and barbie doll…
But “solving all the use cases” is also a valid scope.
My entire thesis is that it’s not a valid scope. This fallacy leads to severe and present problems like the one we’re discussing today. You’re begging the question here.
Tough shit? I want a pony, and a tuba, and barbie doll…
This is an extremely user-hostile attitude to have (and don’t try claiming that telling users with not-even-very-obscure use-cases to write their own tools isn’t user-hostile).
I’ve noticed that some programmers are engineers that try to build tools to solve problems for users, and some are artists that build programs that are beautiful or clever, or just because they can. You appear to be one of the latter, with your goal being crafting simple, beautiful systems. This is fine. However, this is not the mindset that allows you to build either successful systems (in a marketshare sense) or ones that are useful for many people other than yourself, for previously-discussed reasons. The sudo maintainers are trying to build software for people to use. Sure, there’s more than one way to do that (integration vs composition), but there are ways to do both poorly, and claiming the moral high ground for choosing simplicity (composition) is not only poor form but also kind of bad optics when you haven’t even begun to demonstrate that it’s a better design strategy.
My entire thesis is that it’s not a valid scope.
A thesis which you have not adequately defended. Your statements have amounted to “This bug is due to sudo’s complexity which is driven by the target scope/number of features that it has”, while both failing to provide any substantial evidence that this is the case (e.g. showing that sudo’s bugs are due to feature-driven essential complexity alone, and not use of a memory-unsafe language, poor software engineering practices (which could lead to either accidental complexity or directly to bugs themselves), or simple chance/statistics) and not actually providing any defense for the thesis as stated. Assume that @arp242 didn’t mean “all” the usecases, but instead “the vast majority” of them - say, enough that it works for 99.9% of users. Why is this “invalid”, exactly? It’s easy for me to imagine the argument being “this is a bad idea”, but I can’t imagine why you would think that it’s logically incoherent.
Finally, you have repeatedly conflated “complexity” and “features”. Your entire argument is, again, invalid if you can’t show that sudo’s complexity is purely (or even mostly) essential complexity, as opposed to accidental complexity coming from being careless etc.
I dont’t think “users (distros) make a lot of configuration mistakes” is a good defence when arguing if complexity is the issue.
But I do agree about feature set. And I feel like arguing against complexity for safety is wrong (like ddevault was doing), because systems inevitably grow complex. We should still be able to build safe, complex systems. (Hence why I’m a proponent of language innovation and ditching C.)
I dont’t think “users (distros) make a lot of configuration mistakes” is a good defence when arguing if complexity is the issue.
It’s silly stuff like (ALL : ALL) NOPASSWD: ALL. “Can run sudo without a password” seems like a common theme: some shell injection is found in the web UI and because the config is really naïve (which is definitely not the sudo default) it’s escalated to root.
Others aren’t directly related to sudo configuration as such; for example this one has a Perl script which is run with sudo that can be exploited to run arbitrary shell commands. This is also a common theme: some script is run with sudo, but the script has some vulnerability and is now escalated to root as it’s run with sudo.
I didn’t check all of the issues, but almost all that I checked are one of the above; I don’t really see any where the vulnerability is caused directly by the complexity of sudo or its configuration; it’s just that running anything as root is tricky: setuid returns 432 results, three times that of sudo, and I don’t think that anyone can argue that setuid is complex or that setuid implementations have been riddled with security bugs.
Other just mention sudo in passing by the way; this one is really about an unrelated remote exec vulnerability, and just mentions “If QCMAP_CLI can be run via sudo or setuid, this also allows elevating privileges to root”. And this one isn’t even about sudo at all, but about a “sudo mode” plugin for TYPO3, presumably to allow TYPO3 users some admin capabilities without giving away the admin password. And who knows why this one is even returned in a search for “sudo” as it’s not mentioned anywhere.
it’s just that running anything as root is tricky: setuid returns 432 results, three times that of sudo
This is comparing apples to oranges. setuid affects many programs, so obviously it would have more results than a single program would. If you’re going to attack my numbers than at least run the same logic over your own.
Well, whatever we’re comparing, it’s not making much sense.
If sudo is hard to use and that leads to security problems through its misusage, that’s sudo’s fault. Or do you think that the footguns in C are not C’s fault, either? I thought you liked Rust for that very reason. For this reason the original CVE count stands.
But fine, let’s move on on the presumption that the original CVE count is not appropriate to use here, and instead reference your list of 39 Ubuntu vulnerabilities. 39 > 2, Q.E.D. At this point we are comparing programs to programs.
You now want to compare this with 432 setuid results. You are comparing programs with APIs. Apples to oranges.
But, if you’re trying to bring this back and compare it with my 140 CVE number, it’s still pretty damning for sudo. setuid is an essential and basic feature of Unix, which cannot be made any smaller than it already is without sacrificing its essential nature. It’s required for thousands of programs to carry out their basic premise, including both sudo and doas! sudo, on the other hand, can be made much simpler and still address its most common use-cases, as demonstrated by doas’s evident utility. It also has a much smaller exposure: one non-standard tool written in the 80’s and shunted along the timeline of Unix history every since, compared to a standardized Unix feature introduced by DMR himself in the early 70’s. And setuid somehow has only 4x the number of footgun incidents? sudo could do a hell of a lot better, and it can do so by trimming the fat - a lot of it.
If sudo is hard to use and that leads to security problems through its misusage, that’s sudo’s fault.
It’s not because it’s hard to use, it’s just that its usage can escalate other more (relatively) benign security problems, just like setuid can. This is my point, as a reply to stephank’s comment. This is inherent to running anything as root, with setuid, sudo, or doas, and why we have capabilities on Linux now. I bet that if doas would be the default instead of sudo we’d have a bunch of CVEs about improper doas usage now, because people do stupid things like allowing anyone to run anything without password and then write a shitty web UI in front of that. That particular problem is not doas’s (or sudo’s) fault, just as cutting myself with the kitchen knife isn’t the knife’s fault.
reference your list of 39 Ubuntu vulnerabilities. 39 > 2, Q.E.D.
Yes, sudo has had more issues in total; I never said it doesn’t. It’s just a lot lower than what you said, and quite a number are very low-impact, so I just disputed the implication that sudo is a security nightmare waiting to happen: it’s track record isn’t all that bad. As always, more features come with more (security) bugs, but use cases do need solving somehow. As I mentioned, it’s a trade-off.
sudo, on the other hand, can be made much simpler and still address its most common use-cases, as demonstrated by doas’s evident utility
We already agreed on this yesterday on HN, which I repeated here as well; all I’m adding is “but sudo is still useful, as it solves many more use cases” and “sudo isn’t that bad”.
Interesting thing to note: sudo was removed from OpenBSD by [email protected]; who is also the sudo maintainer. I think he’ll agree that “sudo is too complex for it to the default”, which we already agree on, but not that sudo is “too complex to exist”, which is where we don’t agree.
Could sudo be simpler or better architectured to contain its complexity? Maybe. I haven’t looked at the source or use cases in-depth, and I’m not really qualified to make this judgement.
I think arguing against complexity is one of the core principles of UNIX philosophy, and it’s gotten us quite far on the operating system front.
If simplicity was used in sudo, this particular vulnerability would not have been possible to trigger it: why have sudoedit in the first place, which just implies the -e flag? This statement is a guarantee.
If it would’ve ditched C, there is no guarantee that this issue wouldn’t have happened.
There can be logic bugs in basically any language, of course. However, the following classes of bugs tend to be steps in major exploits:
Bounds checking issues on arrays
Messing around with C strings at an extremely low level
It is hard to deny that, in a universe where nobody ever messed up those two points, there are a lot less nasty exploits in the world in systems software in particular.
Many other toolchains have decided to make the above two issues almost non-existent through various techniques. A bunch of old C code doesn’t handle this. Is there not something that can be done here to get the same productivity and safety advantages found in almost every other toolchain for tools that form the foundation of operating computers? Including a new C standard or something?
I can have a bunch of spaghetti code in Python, but turning that spaghetti into “oh wow argv contents ran over some other variables and messed up the internal state machine” is a uniquely C problem, but if everyone else can find solutions, I feel like C could as well (including introducing new mechanisms to the language. We are not bound by what is printed in some 40-year-old books, and #ifdef is a thing).
EDIT: forgot to mention this, I do think that sudo is a bit special given that its default job is to take argv contents and run them. I kinda agree that sudo is a bit special in terms of exploitability. But hey, the logic bugs by themselves weren’t enough to trigger the bug. When you have a multi-step exploit, anything on the path getting stopped is sufficient, right?
+1. Lost in the noise of “but not all CVEs…” is the simple fact that this CVE comes from an embarrassing C string fuckup that would be impossible, or at least caught by static analysis, or at very least caught at runtime, in most other languages. If “RWIIR” is flame bait, then how about “RWIIP” or at least “RWIIC++”?
Probably Python, given the content of the comment by @rtpg. Python is also memory-safe, while it’s unclear to me whether Pascal is (a quick search reveals that at least FreePascal is not memory-safe).
Were it not for the relative (accidental, non-feature-providing) complexity of Python to C, I would support RWIIP. Perhaps Lua would be a better choice - it has a tiny memory and disk footprint while also being memory-safe.
Probably Python, given the content of the comment by @rtpg. Python is
also memory-safe, while it’s unclear to me whether Pascal is (a quick
search reveals that at least FreePascal is not memory-safe).
That’s possibly it.
Perhaps Lua would be a better choice - it has a tiny memory and disk
footprint while also being memory-safe.
Not to mention that Lua – even when used without LuaJIT – is simply
blazingly fast compared to other scripting languages (Python, Perl, &c)!
For instance, see this benchmark I did sometime ago:
https://0x0.st/--3s.txt. I had implemented Ackermann’s function in
various languages (the “./ack” file is the one in C) to get a rough idea
on their execution speed, and lo and behold Lua turned out to be second
only to the C implementation.
I agree that rewriting things in Rust is not always the answer, and I also agree that simpler software makes for more secure software. However, I think it is disingenuous to compare the overall CVE count for the two programs. Would you agree that sudo is much more widely installed than doas (and therefore is a larger target for security researchers)? Additionally, most of the 140 CVEs linked were filed before October 2015, which is when doas was released. Finally, some of the linked CVEs aren’t even related to code vulnerabilities in sudo, such as the six Quest DR Series Disk Backup CVEs (example).
I would agree that sudo has a bigger target painted on its back, but it’s also important to acknowledge that it has a much bigger back - 100× bigger. However, I think the comparison is fair. doas is the default in OpenBSD and very common in NetBSD and FreeBSD systems as well, which are at the heart of a lot of high-value operations. I think it’s over the threshold where we can consider it a high-value target for exploitation. We can also consider the kinds of vulnerabilities which have occured internally within each project, without comparing their quantity to one another, to characterize the sorts of vulnerabilities which are common to each project, and ascertain something interesting while still accounting for differences in prominence. Finally, there’s also a bias in the other direction: doas is a much simpler tool, shipped by a team famed for its security prowess. Might this not dissuade it as a target for security researchers just as much?
Bonus: if for some reason we believed that doas was likely to be vulnerable, we could conduct a thorough audit on its 500-some lines of code in an hour or two. What would the same process look like for sudo?
Another missing point is that Rust is only one of many memory safe languages. Sudo doesn’t need to be particularly performant or free of garbage collection pauses. It could be written in your favorite GCed language like Go, Java, Scheme, Haskell, etc. Literally any memory safe language would be better than C for something security-critical like sudo, whether we are trying to build a featureful complex version like sudo or a simpler one like doas.
I’m not a security expert, so I’m be happy to be corrected, but if I remember correctly, using secrets safely in a garbage collected language is not trivial. Once you’ve finished working with some secret, you don’t necessarily know how long it will remain in memory before it’s garbage collected, or whether it will be securely deleted or just ‘deallocated’ and left in RAM for the next program to read. There are ways around this, such as falling back to manual memory control for sensitive data, but as I say, it’s not trivial.
That is true, but you could also do the secrets handling in a small library written in C or Rust and FFI with that, while the rest of your bog-standard logic not beholden to the issues that habitually plague every non-trivial C codebase.
Besides these capabilities, ideally a language would also have ways of expressing important security properties of code. For example, ways to specify that a certain piece of data is secret and ensure that it can’t escape and is properly overwritten when going out of scope instead of simply being dropped, and ways to specify a requirement for certain code to use constant time to prevent timing side channels. Some languages are starting to include things like these.
Meanwhile when you try to write code with these invariants in, say, C, the compiler might optimize these desired constraints away (overwriting secrets is a dead store that can be eliminated, the password checker can abort early when the Nth character of the hash is wrong, etc) because there is no way to actually express those invariants in the language. So I understand that some of these security-critical things are written in inline assembly to prevent these problems.
It looks like it was added to glibc in 2017. I’m not sure if I haven’t looked at this since then, if the resources I was reading were just not up to date, or if I just forgot about this function.
I do think high complexity is the source of many problems in sudo and that doas is a great alternative to avoid many of those issues.
I also think sudo will continue being used by many people regardless. If somebody is willing to write an implementation in Rust which might be just as complex but ensures some level of safety, I don’t see why that wouldn’t be an appropriate solution to reducing the attack surface. I certainly don’t see why we should avoid discussing Rust just because an alternative to sudo exists.
Talking about Rust as an alternative is missing the forest for the memes. Rust is a viral language (in the sense of internet virality), and a brain worm that makes us all want to talk about it. But in actual fact, C is not the main reason why anything is broken - complexity is. We could get much more robust and reliable software if we focused on complexity, but instead everyone wants to talk about fucking Rust. Rust has its own share of problems, chief among them its astronomical complexity. Rust is not a moral imperative, and not even the best way of solving these problems, but it does have a viral meme status which means that anyone who sees through its bullshit has to proactively fend off the mob.
I don’t understand why you hate Rust so much but it seems as irrational as people’s love for it.
Rust’s main value proposition is that it allows you to write more complex software that has fewer bugs, and your point is that this is irrelevant because the software should just be less complex. Well I have news for you, software is not going to lose any of its complexity. That’s because we want software to do stuff, the less stuff it does the less useful it becomes, or you have to replace one tool with two tools. The ecosystem hasn’t actually become less complex when you do that, you’re just dividing the code base into two chunks that don’t really do what you want.
I don’t know why you hate Rust so much to warrant posting anywhere the discussion might come up, but I would suggest if you truly cannot stand it that you use some of your non-complex software to filter out related keywords in your web browser.
Agree with what you’ve wrote, but just to pick at a theme that’s bothering me on this thread…
I don’t understand why you hate Rust so much but it seems as irrational as people’s love for it.
This is obviously very subjective, and everything below is anecdotal, but I don’t agree with this equivalence.
In my own experience, everyone I’ve met who “loves” or is at least excited about rust seems to feel so for pretty rational reasons: they find the tech interesting (borrow checking, safety, ML-inspired type system), or they enjoy the community (excellent documentation, lots of development, lots of online community). Or maybe it’s their first foray into open source, and they find that gratifying for a number of reasons. I’ve learned from some of these people, and appreciate the passion for what they’re doing. Not to say they don’t exist, but I haven’t really seen anyone “irrationally” enjoy rust - what would that mean? I’ve seen floating around a certain spiteful narrative of the rust developer as some sort of zealous online persona that engages in magical thinking around the things rust can do for them, but I haven’t really seen this type of less-than-critical advocacy any more for rust than I have seen for other technologies.
On the other hand I’ve definitely seen solid critiques of rust in terms of certain algorithms being tricky to express within the constraints of the borrow checker, and I’ve also seen solid pushback against some of the guarantees that didn’t hold up in specific cases, and to me that all obviously falls well within the bounds of “rational”. But I do see a fair amount of emotionally charged language leveled against not just rust (i.e. “bullshit” above) but the rust community as well (“the mob”), and I don’t understand what that’s aiming to accomplish.
I agree with you, and I apologize if it came across that I think rust lovers are irrational - I for one am a huge rust proselytizer. I intended for the irrationality I mentioned to be the perceived irrationality DD attributes to the rust community
Definitely no apology needed, and to be clear I think the rust bashing was coming from elsewhere, I just felt like calling it to light on a less charged comment.
I think the criticism isn’t so much that people are irrational in their fondness of Rust, but rather that there are some people who are overly zealous in their proselytizing, as well as a certain disdain for everyone who is not yet using Rust.
Here’s an example comment from the HN thread on this:
Another question is who wants to maintain four decades old GNU C soup? It was written at a different time, with different best practices.
In some point someone will rewrite all GNU/UNIX user land in modern Rust or similar and save the day. Until this happens these kind of incidents will happen yearly.
There are a lot of things to say about this comment, and it’s entirely false IMO, but it’s not exactly a nice comment, and why Rust? Why not Go? Or Python? Or Zig? Or something else.
Here’s another one:
Rust is modernized C. You are looking for something that already exists. If C programmers would be looking for tools to help catch bugs like this and a better culture of testing and accountability they would be using Rust.
The disdain is palatable in this one, and “Rust is modernized C” really misses the mark IMO; Rust has a vastly different approach. You can consider this a good or bad thing, but it’s really not the only approach towards memory-safe programming languages.
Of course this is not representative for the entire community; there are plenty of Rust people that I like and have considerably more nuanced views – which are also expressed in that HN thread – but these comments certainly are frequent enough to give a somewhat unpleasant taste.
While I don’t approve of the deliberately inflammatory form of the comments, and don’t agree with the general statement that all complexity is eliminateable, I personally agree that, in this particular case, simplicity > Rust.
As a thought experiment, world 1 uses sudo-rs as a default implementation of sudo, while world 2 uses 500 lines of C which is doas. I do think that world 2 would be generally more secure. Sure, it’ll have more segfaults, but fewer logical bugs.
I also think that the vast majority of world 2 populace wouldn’t notice the absence of advanced sudo features. To be clear, the small fraction that needs those features would have to install sudo, and they’ll use the less tested implementation, so they will be less secure. But that would be more than offset by improved security of all the rest.
Adding a feature to a program always has a cost for those who don’t use this feature. If the feature is obscure, it might be overall more beneficial to have a simple version which is used by the 90% of the people, and a complex for the rest 10%. The 10% would be significantly worse off in comparison to the unified program. The 90% would be slightly better off. But 90% >> 10%.
Rust’s main value proposition is that it allows you to write more complex software that has fewer bugs
I argue that it’s actually that it allows you to write fast software with fewer bugs. I’m not entirely convinced that Rust allows you to manage complexity better than, say, Common Lisp.
That’s because we want software to do stuff, the less stuff it does the less useful it becomes
Exactly. Software is written for people to use. (technically, only some software - other software (such as demoscenes) is written for the beauty of it, or the enjoyment of the programmer; but in this discussion we only care about the former)
The ecosystem hasn’t actually become less complex when you do that
Even worse - it becomes more complex. Now that you have two tools, you have two userbases, two websites, two source repositories, two APIs, two sets of file formats, two packages, and more. If the designs of the tools begin to differ substantially, you have significantly more ecosystem complexity.
You’re right about Rust value proposition, I should have added performance to that sentence. Or, I should have just said managed language, because as another commenter pointed out Rust is almost irrelevant to this whole conversation when it comes to preventing these type of CVEs
The other issue is that it is a huge violation of principle of least privilege. Those other features are fine, but do they really need to be running as root?
Plugins, integrated log server, TLS support… none of that are things I’d want in a tool that should be simple and is installed as suid root.
(Though I don’t think complexity vs. memory safety are necessarily opposed solutions. You could easily imagine a sudo-alike too that is written in rust and does not come with unnecessary complexity.)
There’s nothing wrong with EBNF, but there is something wrong with relying on it to explain an end-user-facing domain-specific configuration file format for a single application. It speaks to the greater underlying complexity, which is the point I’m making here. Also, if you ever have to warn your users not to despair when reading your docs, you should probably course correct instead.
Rewrite: The point that you made in your original comment is that sudo has too many features (disguising it as a point about complexity). The manpage snippet that you’re referring to has nothing to do with features - it’s a mix between (1) the manpage being written poorly and (2) a bad choice of configuration file format resulting in accidental complexity increase (with no additional features added).
Taking Drew’s statement at face value: There’s about to be another protracted, pointless argument about rewriting things in rust, and he’d prefer to talk about something more practically useful?
I don’t understand why you would care about preventing a protracted, pointless argument on the internet. Seems to me like trying to nail jello to a tree.
This is a great opportunity to promote doas. I use it everywhere these days, and though I don’t consider myself any sort of Unix philosophy purist, it’s a good example of “do one thing well”. I’ll call out Ted Unangst for making great software. Another example is signify. Compared to other signing solutions, there is much less complexity, much less attack surface, and a far shallower learning curve.
I’m also a fan of tinyssh. It has almost no knobs to twiddle, making it hard to misconfigure. This is what I want in security-critical software.
All of the above is orthogonal to choice of implementation language. You might have gotten a better response in the thread by praising doas and leaving iron oxide out of the discussion. ‘Tis better to draw flies with honey than with vinegar. Instead, you stirred up the hornets’ nest by preemptively attacking Rust.
PS. I’m a fan of your work, especially Sourcehut. I’m not starting from a place of hostility.
Sudo was first conceived and implemented by Bob Coggeshall and Cliff Spencer around 1980 at the Department of Computer Science at SUNY/Buffalo. It ran on a VAX-11/750 running 4.1BSD. An updated version, credited to Phil Betchel, Cliff Spencer, Gretchen Phillips, John LoVerso and Don Gworek, was posted to the net.sources Usenet newsgroup in December of 1985.
The current maintainer is also an OpenBSD contributor, but he started maintaining sudo in the early 90s, before OpenBSD forked from NetBSD. I don’t know when he started contributing to OpenBSD.
So I don’t think it’s fair to say that sudo originated in OpenBSD :)
Then go use something like Hugo, etc. This lets you edit live, that’s a non-trivial task when the files are 100% static. There is a trade-off for this.
Personally I just use Fossil-scm for websites anymore. I can host files, edit locally or remotely (live or not), sync changes, etc. It also is just a sqlite file.
I can’t imagine go + sqlite not being fast enough for almost all websites in the world that would ever use something like this just doing the rendering every page load.
I haven’t looked at how this is coded, but if that is required for load reasons, then a cache would probably be a better solution.
But you are right, one could store in sqlite the rendered version (or even out on temp disk), for example.
But you are correct, technically. sure. I don’t see any reason to bother with the added complexity.
Just turning on your nginx caching option(or other front-end handling the TLS cert) would almost certainly be 99% easier and achieve basically the same effect.
I can’t imagine go + sqlite not being fast enough for almost all websites in the world that would ever use something like this just doing the rendering every page load.
Static is simpler.
I don’t see any reason to bother with the added complexity.
Again, static is simpler.
Just turning on your nginx caching option
Now that’s insane. A cache frontend for a backend that’s dynamically-rendering static content.
Instead of just serving the static content directly.
99% easier
Than serving static content? You can’t be serious.
yes, static content is simpler, but is it simpler for this project, the way it’s implemented currently? I’d argue no.
You have to render the content at some point. You have effectively 2 choices, you can render @ save time, and create , effectively, a static site, or you can render it at load time.
You seem to think rendering at save time is the better choice, Then you would save both copies, the raw MD file and the rendered HTML, put them both in your sqlite DB. Then at HTTP GET time, you can simply stream from the sqlite file the rendered version. (alternatively you could store the rendered content out in some file somewhere I guess… complicating the code even further. sqlite is often faster than open() anyway, so I’d argue that point also..
The problem is, it’s easy to have cache and sync issues. If you do it at save time, and there is exactly 1 way to edit, then the cache and sync issues basically go away.. but there is more than 1 way to edit, you can edit using any thing that can speak sqlite or calling sqlite CLI directly.. or using the web interface. The big feature of this code base is the ‘live edit’ feature of the CMS, so one could punt the problem, and save 2 copies in sqlite, the raw MD and the rendered version, and if you are so inclined to do it outside of the live edit session, then it’s your problem to update the cache also.
Alternatively, do it at read time(HTTP GET), and save yourself the headache of cache and sync issues. This is the simpler version. Seriously, it is. It was one sentence, vs a paragraph for rendering @ save time.
Complication matters. Static seems simpler, but is it really?? not always.
Sure, but reading from sqlite is dynamic. Whereas any static webserver can serve a plain file. I prefer static webservers, as they are the simplest. This means low LoC count, which means easy to understand/audit/sandbox.
Specifically, I use openbsd’s httpd, and I would like to eventually move away from UNIX for my for public services whenever possible (e.g to a component system on top of seL4). A static website served from plain files is best.
Changing the goalposts again, that’s fine I can meet your goalpost here too :)
Reading from a static sqlite file isn’t any more dynamic than open() on a file. They are both reading effectively static content in the above scenario of rendered content.
I agree from a security point of view, something like seL4 will be more secure, for some definitions of secure.. but at some point we are just messing around and not actually solving problems.
What is the security risks you are trying to protect from? Without identifying the risks, it’s very hard to mitigate them. Otherwise we are just playing security theatre games, for zero benefit.
What’s the worst case scenario here, that someone manages to get into a static content web server? Given proper permissions.. nothing. if they get in, rootkit and get write access, the situation is worse, but again, given static content the likes of a personal project, the consequences are equally trivial I imagine.
Anyways, you didn’t refute any of my statements above, so I’m glad we finally agree, static is not always better, or even simpler.
Like I mentioned way up thread, I like live-edit and I’m very lazy. I just use Fossil-scm for my websites. They are technically dynamic now, but it’s amazingly easy to make them go, I even get a built in forum and bug-tracking for feedback, email notifications, etc. I get 100% revision and history control and is audit-able, off-line sync capable, live edit capable, etc. Plus deployment is super easy, a single binary that does everything and backups are equally trivial as it’s a single sqlite file. Because of offline capabilities, I generally have a few copies laying about anyways and it’s all cross-platform.
Reading from a static sqlite file isn’t any more dynamic than open() on a file. They are both reading effectively static content in the above scenario of rendered content.
My webserver doesn’t support serving from static sqlite files. Dynamic as in, I’d have to run cgi code in addition to my webserver.
Like I mentioned way up thread, I like live-edit and I’m very lazy.
Me too, thus I’d love a CMS. It’s just, while dynamic is good for me (the one writing articles), it is unnecessary for viewers. I currently use a static site generator, which takes Markdown as input.
I do not wish to change the setup of my public site, which is a static site.
What is the security risks you are trying to protect from? Without identifying the risks, it’s very hard to mitigate them. Otherwise we are just playing security theatre games, for zero benefit.
On my (public) personal sites, I simply want to minimize complexity, which should mitigate a broad range of security risks. It’s a win/win strategy.
Just turning on your nginx caching option(or other front-end handling the TLS cert) would almost certainly be 99% easier and achieve basically the same effect.
In terms of the security analysis it’s a completely different thing to have a dynamic application running exposed to the internet, even if you cache it.
OK, I think I get your point(that in security complexity hurts you), but I think we have very different understandings of security analysis, so I’m going to write some stuff here.
You can’t talk about security mitigation’s without talking about specific risks you are trying to eradicate.
Nginx is dynamic. openbsd’s HTTPD is dynamic. any “static webserver” is dynamically taking inputs (HTTP GET requests) and mapping them to files out on a filesystem. Nothing is stopping nginx from serving /etc/shadow or /home/rain1/mysupersecretthinghere. Except some configuration(and hopefully) some file permissions.
This is no different than program X taking an HTTP get, opening a sqlite file and serving the results out of a column. It’s totally doable and equally “dynamic” for the most part.
I think what you are trying to say is, if rwtxt(since we are in the rwtxt thread) happens to be compromised(and I get complete control of it’s memory) I can convince rwtxt to render to you whatever I want.
Except the same is true of any other webserver. If I compromise nginx serving files from a filesystem(in the same way above), I can also have it render to you whatever I want.
There is basically no difference from a security analysis point of view. between rwtxt and a sqlite file and nginx and a html file. Both files can be read only from the web server perspective; of course then rwtxt’s live edit will not work, but hey, we are trying to be SECURE DAMMNIT! lol.
The difference here, from a security analysis perspective is, nginx is way way way more popular than rwtxt(today, who knows about tomorrow), so the chances of finding a complete compromise of nginx is, one hopes, much much harder than rwtxt, a tiny project mostly(completely? – didn’t look) written by one person. Of course the opposite is also true, way more bad people are looking at how to compromise nginx, then rwtxt, there is something to be said for security through obscurity, in a vague sort of hand-wavy way… as long as you are not an active target of bad people.
Hence why we go back to: You can’t talk about practical security mitigation’s without talking about specific risks you are trying to eradicate.
So mostly your sentence makes no sense from a security analysis perspective.
Please, please, please don’t comment code as per the “good” code example. In my opinion, comments should explain the “why” or other non obvious design decisions instead of merely reiterating what the code is doing.
Yeah, the example is uncompelling and I would not look kindly on it in a review.
That said, project like GCC have comments of the “it does this” nature and they are immensely useful because it is usually not obvious what the code is, in fact, doing. The reasons for this are legion, but even something seemingly simple benefits from basic comments because you often end up jumping into this code from something that is essentially unrelated. Without those kinds of comments, you would end up spending an incredible amount of time getting to know the module (which is often very complicated) just to get what tends to be tangential, and important, information.
Nice savings! I’m a little confused as to the use of PubSub for the import/export process though. IIUC Elasticsearch already has existing methods to migrate indexes. Would it have been even cheaper to use these methods and cut the PubSub cost out of the migration process?
I used to be a Gentoo user in its early days. It’s great. But it also sparked a lot of cool technologies, including things like probably the best init system for Linux, OpenRC. They did really cool stuff, they used to have (still do?) one of the biggest package collections. Speaking about quality packages, not AUR - which I love, but is… well it’s what you get when everyone can just add a package.
Another strong suit used to be portability. I had a time when I loved getting OSS to run on exotic hardware and such and Gentoo did beat Debian on for example PA-RISC.
Later I was surprised to find that at university people ran Gentoo images at the university’s computers at a large scale. Gentoo felt a bit niche before that.
They also had a time when a lot of leading edge security stuff happened there, because it was easy to just build things a certain way and get a patched kernel.
While I haven’t used Gentoo in two decades or so, I think there is many “small” (in popularity, not in work or impact) things that really advanced Linux and the whole ecosystem, at least in inspiration and prior art.
Check it out again :-) they’ve done a lot of work to improve the experience, including a binary package host for faster builds. I run it on my VPS and love it.
Actually saw that some time ago and every now and then I think about it. It’s on my bucket list.
After reading, I still don’t think I fully understand what Antithesis is. I think the bit that’s getting me is how something could be completely reproducible unless you ran it on actual hardware.
They run your software on their virtualized, deterministic OS which is a fork of one of the BSDs, I forget which.
They use their own deterministic fork of bhyve on FreeBSD.
aside: Oh, Andrew Kelley! I didn’t expect that you’d be the person to respond to me. You’re on my “people I think are cool” list on my website.
The more I hear about people using BSDs for different tasks, the more I feel like I should spend some time using one.
Cheers, thanks for the kind words
I’m curious as to why Antithesis uses FreeBSD for their hypervisor instead of Linux. Anyone know?
My guess is that making a deterministic hypervisor involves a lot of kernel hacking, and BSD source code is much smaller, coherent, and maintainable by fewer people than Linux
Linux is like a big sprawling thing that uses every trick in the book, with code from thousands of people, and hundreds of different companies
Also I recall hearing specifically that the hypervisor support in Linux is a nightmare. This might have been from some talks on Solaris/Illumos (e.g. from Bryan Cantrill), which shares some lineage with BSDs
(I am not a kernel hacker, nor do I have any first hand knowledge, so take this with a grain of salt. However I do know people who have worked on high perf networking in the kernel, and say there is a stark difference between BSDs and Linux)
I suspect this is the main reason, another one may be the license since bhyve and FreeBSD are licensed under the BSD license, which is more permissive than Linux’ GPLv2.
I sent an email to Antithesis and received a reply from their CEO:
Yes, I regularly remove stories that seek customer service via public shaming. I don’t want Lobsters used to whip up a mob in an outrage and direct them at targets. It always feels righteous at first and becomes an awful tool for abuse.
I don’t think it’s right to call this a post seeking customer service. It’s a post calling attention to a policy change made by Automattic that immediately affected the privacy of all Tumblr users, and all Wordpress.com users. Users of those platforms have to revoke the consent that Automattic assumed it had - wrongly, in the view of OP. Is that outwith the scope of Lobsters?
tumblr & WordPress posts were already being scraped by AI companies, just like the rest of the public Internet, and users had no control over it.
Automattic just gave control to users, allowing them to increase privacy. The opt-in nature means there is no change in behavior unless the user chooses to take action, which seems reasonable to me.
It is explicitly opt-out, not opt-in. The original post rightly flagged this as a problem of the platform assuming user consent. Content scraping is already happening, yes, but as SoapDog said below, Automattic is directly profiting from this sale of data, and that data originally dumped by Tumblr included private and deleted posts, not just public ones. It’s unclear whether this was given to OpenAI.
As far as I am aware, it is “opt out” and not “opt in” and that is quite different. Also, the posts being scrapped by AI companies harvesting the Web is already a problem, but then the problem is with the AI companies. Automattic PROFITTING from users content by packaging it and selling it without consent is even worse and makes it their problem.
What opt-in nature are you referring to? The post is complaining about Automattic creating a new default: selling your data directly to AI companies, without compensating you, unless you explicitly opt out. Sure, for most sites, AI scrapers could do that anyway just by ignoring your robots.txt, as always. But there doesn’t seem to be any opt-in facet to this change that I can see.
Naturally, for public posts, even if users choose to opt-out, their stuff still gets scraped unless they also pay-wall or login-wall the material. But it sounds like automattic is selling more than that unless you proactively opt out.
So I don’t see how Automattic is increasing control or privacy so far… can you elaborate more?
Sorry, that was a typo; I meant opt-out.
There is no change in behavior by default: your data was scraped by AI companies before, and it still is. The change is that you are now allowed to opt out. That sounds positive to me.
Before, only your public data was being scraped by AI companies. I believe that public data will continue to be scraped by AI companies regardless of whether you use automattic’s opt-out mechanism. It’s public, after all.
Now, automattic is offering those AI companies more of your data. And they will share that by default unless you opt out.
If all of the data you share with automattic is public, it seems like no change to me.
If some of the data you share with automattic is not public, it sounds like a significant downgrade to need to opt out.
tumblr in particular doesn’t really have non-public content (unless there’s some option I’ve never discovered in the 12 years I’ve been using it.)
Grabbed the link from the moderation queue to see what the fuss was about.
The post was not asking for customer service, but instead the change in said service was the catalyst for discussing a broader issue in Customer-Generated content and policy surrounding it.
For those wondering, a summary of the article: “Opt-out isn’t a good model when it comes to handling scrapers and similar, and in continuing to legitimize this behavior, companies that engage in this are eroding the discussion around the consent of a company’s handling of customer data. Automattic has decided to engage in this behavior, and I happen to pay Automattic to host my blog, but the issue is far greater and I felt the need to speak on it.”
This is voicing displeasure with a policy change and UX dark-patterns that enable technical actions which are not always in the better interests of users/customers/etc. This is not “uwu automattic locked me out of my tumblr for women offering plumbing supplies they suck go beat on their door”, it’s a discussion on “just how much leeway does a company have on the data that they store for a customer and when is it better to ask permission rather than require explicit disapproval?”
It’s a screed, not a discussion.
A discussion would include things like:
The blogpost doesn’t really do any of those things, and certainly not at a level beyond the most simple, knee-jerk, and facile.
I am unaware of lobste.rs enforcing such criteria in the past or having a general rule against “screeds”. There’s even a
rant
tag. And plenty of “simple, knee-jerk and facile” posts show up here and don’t get removed.So I think you will need to find a better argument against the post in question.
Your reading of my comment is incorrect.
My argument there is not with the submission (though I complain about that elsewhere), but with the comment I’m replying to: the comment is claiming that the post is a discussion, I claim the post is merely a screed–and not a particularly good one at that–and give some examples of what would elevate it.
By the way, the
rant
tag also has some effects that hint that it isn’t the preferred content here:Some good rants make it through of course.
As do some bad ones.
The article in question seems very relevant and educational to the Lobste.rs community given the tech industry’s poor understanding of consent and widespread abusive practices around user data.
As far as I know, based on logic and personal experience, the tech industry understands consent quite well. Decisions to make things opt-out are not accidental. It’s a business decision made in consultation with legal, based on a desired outcome. There’s no technical question, and I doubt this is a surprising situation to anyone involved in implementing an opt-out like this.
IMO you provide an excellent example. You are confusing “do I have consent?” with “is it legal?”.
Their users consented to use of their data for providing blogging services. Selling that data to a 3rd party for other, unrelated reasons is unethical because the user did not know about this possible use when they signed up for the service. It’s unethical to automatically opt them in.
I don’t think he was. From what I’m reading, he was politely saying that those tech businesses deliberately ignore consent, and instead just look at money and law. Cynicism, not incompetence.
Now if I were asked to implement that kind of opt-out dark pattern, I would definitely consider answering “sorry, find someone else”. I could afford it right now.
I totally get that. I trust that is exactly the discussion that business had with legal (and corporate comms and marketing and government affairs). And the only technical option you have in this situation is to find another job. So while this is a completely valid and necessary topic, it doesn’t seem (to me) like Lobsters is the right place for it.
I agree. There’s no technical difficulty in making something “opt-in” instead of “opt-out”, and a discussion of why opt-in is better isn’t technical and isn’t going to make me a better programmer.
As I write this, it’s Wednesday in my time zone.
If, hypothetically, some open-source project announced tomorrow – Thursday – that as the Next Chapter of their Exciting Open Source Journey they’re switching to BSL or another “source available” license in order to better monetize the project, and
If, hypothetically, I were to write and publish a blog post the following day – Friday – talking about the philosophy and ethics of Free Software and Open Source and condemning such switches, and calling on people to vote with their wallets by ceasing use of such a project, then
Would you remove that post from lobste.rs?
I ask because so far as I can tell, such a post would not differ in form or aims from the removed post under discussion, yet posts which couch “customer service complaints” or “business news” in even the thinnest possible framing of being about FLOSS licensing and licensing ethics don’t seem to get removed the way this one did. Heck, sometimes just the pure “business news” of a license change announcement is left up as apparently on-topic.
And to register a personal opinion, I think the post being discussed here was more on-topic for lobste.rs – if viewed through the lens of “pertains to computing” – than licensing slapfight threads typically are. I also think the post being discussed here was on-topic for lobste.rs and should not have been removed.
There is nothing about this seeking “customer service”, Peter.
You’re right that your article isn’t seeking customer service, but I do think that the second part of @pushcx’ comment - “I don’t want Lobsters used to whip up a mob in an outrage and direct them at targets” - is a valid choice. It’s not the choice you made for your (excellent!) blog, but I don’t think @pushcx is making an invalid choice for Lobsters.
I’d already read your article via Mastodon, and liked it - and while I agree your article is not a request for customer service, I do think one could reasonably call it advocacy.
I do hope that you’ll continue to find value in Lobsters regardless of the outcome of this thread; I’ve enjoyed reading your comments here (as well as your blog), and I wish you the very best.
I am curious to know what you think of my analogy to posts about companies doing license changes, which to me are largely indistinguishable in form from the removed post under discussion here, but somehow are still allowed (despite being “business news” and not “pertaining to computing” and often being used to “whip up a mob” and “direct them at targets”).
It might be relevant that changes to blogging platforms affect authors more broadly, while license changes affect developers in particular. Lobsters caters to both to a degree, but more to the latter than the former.
Automattic is a store of data, they also have the details for those that are affected by the decisions done (and directly inform the users). All users are implicitly at the mercy of any changes in ToS, and must respond / care / etc within a reasonable amount of time.
A codebase’s only transaction with users is when those users acquire the code, at that time they can check the license and decide how they feel about it. They only need to check the license when acquiring the code, no other time. There is no mechanism to convey this information to the users otherwise. And it does not apply within a reasonable amount of time either.
I think they are very distinguishable.
There’s also a
privacy
tag here for use and abuse of one’s data – not even necessarily one’s confidential data. See, for example, threads about people leaving GitHub to avoid having their code used for things like Copilot.So I still don’t see a meaningful difference between the post being discussed here, and many things which have gone un-removed in the past.
(First: I really appreciated your post. I was a bit out of the loop because of other things eating my attention lately, and your post did a great job of both catching me up on things I’d seen on fedi but hadn’t carefully read yet, and contextualizing the underlying consent issues. Thank you for writing that.)
It clearly wasn’t seeking customer service, but if the mod message had said “lobste.rs is not your torch and pitchfork outlet” instead, I’m not sure I’d have batted an eye. And the two messages are really mostly equivalent, IMO.
I have no power here; I simply like this place and enjoy many of the discussions that can be had here. But I don’t really feel that this site is a good place to discuss your (IMO excellent) post. It could draw a good discussion here, but it could also (more likely, IMO) draw a really terrible one. The only reason I wish it got left up on the page is because I really feel the points you made need a signal boost in the industry.
But I’m pretty sure that I’d have needed to hit “hide” on the comments to avoid being drawn into a flame war.
Thanks for all you write. I always learn something when I read it.
I believe you, but note it’s indistinguishable from that. Someone could write that article or post it here with a different motive.
While I agree with the moderation decision, I was wondering if you would be open to rewording the mod messages in a more compassionate manner? I think they are a little abrasive, and the wording might be the reason why folks are getting upset.
Additionally, while all of the moderation actions are transparent, I think the guidelines for posting are difficult to find. They are buried under “tags and topicality” on the About page, mixed in with information about how the tagging and ranking system works. The orange site has a clear set of guidelines that one can find linked on the bottom of the site.
Thanks, this is all really good points. The About page started as a post about technical features and it really wasn’t clear what was happening in that section after years of edits. I’ve lifted the topicality info up to a top-level section titled Guidelines and expanded it with sections on the site climate (where I’ve tried to capture the site’s vibe in positive terms rather than a list of “do not”s), this topic of brigading, and self-promo. I took this language from the mod log, hatted comments/DMs, and meta threads, and I’ll need to do a comprehensive review of those at some point to flesh things out. I hope folks will suggest things I’ve missed or could’ve explained better; I’m particularly not satisfied that I had to handwave a bit about where to draw a line on brigading and would like to do better than this slightly “know it when I see it”.
I’ll try to echo this less frustrated language in future mod messages, or otherwise make those clearer and more actionable. Thanks for the criticism.
I am grateful that Lobsters is not an “outrage-driven” news site.
I’m not sure if this is still the case, but one of the dangers with AVX-512 instructions is that they can cause performance degradations due to how much power they use. It would be interesting to see how this optimization held up outside of the realm of micro-benchmarks.
It’s fine now: “The results paint a very promising picture of Rocket Lake’s AVX-512 frequency behavior: there is no license-based downclocking evident at any combination of core count and frequency”, license in this context being the Intel term for instruction heaviness
That’s good to know that there’s not the automatic downclocking like in Haswell/Skylake. Thanks for sharing!
The other limits would still be relevant especially in virtualized / containerized / other multi-tenant workloads, wouldn’t they?
Personally I know of the following options:
uhttpd
(https://git.openwrt.org/project/uhttpd.git) – from the OpenWRT project, powering their UI; built-in support for Lua; (the documentation is somewhat missing…)civetweb
(https://github.com/civetweb/civetweb) – I’ve just stumbled-upon this one today while looking at the options available in BuildRoot; looks promising!thttpd
(https://acme.com/software/thttpd/) – a venerable HTTP server;webfsd
– has support for CGI, but is meant mainly for static content;I hope others would give pointers to other similar projects!
Why do you consider nginx heavy-weight? Many of the testimonials on their website show a memory usage in the single digit MBs.
When I say “heavy-weight” it means something on the lines “when considering many aspects of the software solution, the outcome is on the heavy side”; for example (and here I speak about general purpose HTTP servers like NGINX / Apache / etc.):
However, perhaps my main issue with these general purpose servers is that they are tailored for “normal” servers (with lots of RAM and CPU, at least relative to small embedded systems), and if you want them to run well on embedded devices you’ll have to tweak a lot of knobs.
Thus, my quest for a HTTP server geared towards small / embedded deployments: I can assume that the developers have already chosen defaults that are well suited for such environments.
… I think this might be a valid criticism of Nginx, due to its compiled-in nature for “plugins”, but with apache, almost everything is provided by loaded modules. Don’t load the modules, and you don’t get the complexity/attack surface/dependencies.
There is a setting for BitLocker that locks the TPM with a PIN that should bypass this attack.
A really nice error handling pattern in C++ is using something like absl::StatusOr or std::expected. I like this a lot better than exceptions since the control flow is a lot more explicit.
Working
std::expected
link: https://en.cppreference.com/w/cpp/utility/expectedIn my experience, it’s very rare that you actually need strict ordering in your message queue. For instance, the delivery notifications example in the article is not strictly ordered end-to-end. Once your notifications are sent to the user, there is no guarantee that they will arrive in the same order (or on time, or even at all!). In that case, why maintain a strict ordering and the scaling limitations that come with it?
A global total order on messages (and guaranteed delivery in that order) can simplify reasoning about a system. For example, it can be useful to know that if you’ve seen a given message, you’ve already seen all previous messages (WRT the total order). Replication can also be easier with this guarantee: you can quantify how “far behind” one replica is WRT another, and if 2 replicas have the same “last seen message ID”, you know they’re identical.
I think this is a super important point that seems a little glossed over in the article. In my experience, much of the time, you don’t see certain unknowns when you’re planning a solution for a problem. In some cases, it may even be worth it to write some exploratory code first to explore the problem space before codifying possible solutions.
Kudos for doing the hard task of taking a stand. This writing resonates with me because I’m also trying to “downgrade” my tech assets into manageable solutions, as opposed to letting Google/Apple handle everything for me. I’ve been trying to get started on making my own NextCloud setup at home for myself and others, and been recently looking into things like pmOS for some of my own devices.
Good luck in the journey 🎉 (also there’s a typo in the word “als”).
More power to you. I worry the next generation of kids will know nothing but FAANG, since many (if not all) public schools utilize Google Classroom, post updates on IG and FB, and run Gmail enterprise mail. This is US-specific though, and I am not aware of how it is in EU and Asia.
My generation knew nothing but Big Tech Software™ as well, since many (if not all) public schools utilized Blackboard (where they also posted updates), and ran Microsoft Outlook enterprise mail. Same stuff, different generation. I think we turned out OK.
Agreed that the jump from Blackboard to Microsoft mail to Gmail is not that hard, maybe I did not elaborate my point properly. My bigger fear is Big Tech Software will weaponize the information they have on my kid in the future. I may be too paranoid, but I think Big Tech has enough information on my kid and her behavior and thought process to be able to subtly manipulate those when she turns an adult. :-(
Consider instead the ironic joke that has circulated a lot in recent years about how it’s actually been the Baby Boom/Gen X parents of today’s young people who used to say “don’t believe everything you see online!” and then… fell into believing everything they saw online, and so have been manipulated into consuming, like addicts, an ever-worsening spiral of outrage content pushed on them by algorithms which optimized for that.
The younger generations have grown up aware of the fact that social media tries to lead them down that path. That alone makes a significant difference in how it affects them.
I sincerely hope that is the case, though I am not convinced the awareness is where you say it is based on my personal experience (but again it is more of an opinion than backed by solid data!) :-)
Agreed, I think the next generation is going to suffer a lot because of these dependencies. Because of how deeply rooted Windows/Android/iOS is in the consumer market, businesses are not going to be keen on changing the status quo. The only way to take back control is to start encouraging others and sharing progress or experiences, which I’ll keep doing and always like to read about from others.
IMHO it’s hard to get much out of reading a codebase without necessity. Without a reason why, you won’t do it, or you won’t get much out of it without knowing what to look for.
Yeah, this seems a bit like asking “What’s your favorite math problem?”
I dunno. Always liked 7+7=14 since I was a kid.
Codebases exist to do things. You read a codebase because you want to modify what that is or fix it because it’s not doing the thing its supposed to. Ideally, my favorite codebase is the one I get value out of constantly but never have to look at. CPU microcode, maybe?
I often find myself reading codebases when looking for examples for using a library I am working with, or to understand how you are supposed to interact with some protocol. Open source codebases can help a lot there. It’s not so much 7 + 7 = 14, but rather 7 + x + y = 23, and I don’t know how to do x or y to get 23, but there are a few common components between the math problems. Maybe one solution can help me understand another?
I completely agree. I do the same thing.
when I am solving a similar problem or I’m interested in a class of problems, sometimes I find reviewing a codebase very informative. In my mind, what I’m doing is walking through the various things I might want to do and then reviewing the code structure to see how they’re doing it. It’s also bidirectional: A lot of times I see things in the structure and then wonder what sorts of behavior I might be missing.
I’m not saying don’t review any codebases at all. I’m simply pointing out that without context, there’s no qualifiers for one way of coding to be viewed as better or worse than any other. You take the context to your codebase review, whether explicitly or completely inside your mind.
There’s a place for context-free codebase reviews, of course. It’s usually in an academic setting. Everybody should walk through the GoF and functional data structures. You should have experience in a generic fashion working through a message loop or queuing system and writing a compiler. I did and still do, but in the same way I read up on what’s going on in mRNA vaccinations: familiarity. There exists these sorts of things that might help when I need them. I do not necessarily have to learn or remember them, but I have to be able to get them when I want. I know these coding details at a much lower level than I do biology, after all, I’m the guy who’s going to use and code them if I need them. But the real work is matching the problem context up (gradually, of course) with the various implementation systems you might want to use.
There are folks who are great problem-solvers that can’t code. That sucks. There are other folks who can code like the wind but are always putting some obscure yet clever chunk of stuff out and plugging it in somewhere. That also sucks. Good coders should be able to work on both sides of that technical line and move back and forth freely. I review codebases to review how that problem-solving line changed over the years of development, thinking to myself “Where did these guys do too much coding? Too little? Why are these classes or modules set up the way they are (in relation to the problem and maintaining code)?”
That’s the huge value you bring from reviewing codebases: more information on the story of developing inside of that domain. The rest of the coding stuff should be rote: I have a queue, I have a stack, etc. If I want to dive down to that level, start reviewing object interface strategy, perhaps, I’m still doing it inside of some context: I’m solving this problem and decided I need X, here’s a great example of X. Now, start reading and go back to reviewing what they’ve done against the problem you’re solving. Don’t be the guy who brings 4,000 lines of code to a 1 line problem. They might be great lines of code, but you’re working backwards.
Yeah, I end up doing this a lot for i.e obscure system-specific APIs. Look at projects that’d use it/GH code search, chase the ifdefs.
Great Picard’s Theorem, obvs. I always imagined approaching an essential singularity and seeing all infinity unfold, like a fractal flower, endlessly repeated in every step.
I’d disagree. While sure, one could argue you just feed a computer what to do, you could make a similar statement about for example architecture, where (very simplified) you draw what workers should do and they do it.
Does that mean that architects don’t learn from the work of other architect? I really don’t think so.
But I also don’t think that “just reading” code or copying some “pattern” or “style” from others is what makes you like it. It’s more that if you write some code only on your own or with a somewhat static, like-minded team your mental constructs don’t really change, while different code bases can challenge your mental model or give you insights in a different mental/architectural model that someone else came up with.
For me that’s not so different from learning different programming languages - like really learning them, not just being able to figure out what it means or doing the same thing you did before with different syntax.
I am sure it’s not the same for everyone, and it surely depends on different learning styles, but I assume that most people commenting here don’t read code like the read a calculation and I’d never recommend people to just “read some code”. It doesn’t work, just like you won’t be a programmer after just reading a book on programming.
It can be a helpful way of reflecting on own programming, but very differently from most code-reviews (real ones, not some theoretical optimal code review).
Another thing, more psychological maybe is that I think everyone has seen bad code, and be it some old self-written code from some years ago. Sometimes it helps for motivation to come across the opposite by reading a nice code base to be able to visualize a goal. The closer it is to practical the better in my opinion. I am not so much a fan of examples or example apps, because they might not work in real world code bases, but that’s another topic.
I hope though that nobody feels like they need to read code, when they don’t feel like it and it gives them nothing. Minds work differently and forcing yourself to do something seems to often counter-act how much is actually learned.
“Mathematics is not a spectator sport” - I think the same applies to coding.
Well, it varies. Many contributions end up being a grep away and only make you look at a tiny bit of the codebase. Small codebases can be easier to grasp, as can those with implementation overviews (e.g. ARCHITECTURE.md)
I have to agree with this; I’ve found the most improvement comes from contribution, and having my code critiqued by others. Maybe we can s/codebases to study/codebases to contribute to/?
Even if you don’t have to modify something, reading something out of a necessity to understand it makes it stick better (and more interesting) than just reading it for the sake of reading. That’s how I know more about PHP than most people want to know.
Years ago working on my MSc thesis I was working on a web app profiler. “How can I get the PHP interpreter to tell me every time it enters or exits a function in user code” led to likely a similar level of “I know more about the internals of PHP than I would like” :D
I was looking into this style of error handling last week. Currently the Outcome library looks like the best choice — it can be used standalone or with Boost, only requires C++14, and claims to be quite lightweight.
The upcoming exception refresh in C++2x is going to be similar to these in architecture, but integrated into the language syntax so it looks more like try/catch, and probably faster since the ABI will allow for optimizations like using a CPU flag to indicate whether the return value is a result or an error.
That’s cool! Do you have any more info on the exception refresh (eg. examples)? I’m not seeing how try/catch can work with stuff like StatusOr.
I don’t have the URL it came from, but the working-group document is titled “Zero-overhead Deterministic Exceptions: Throwing Values”, by Herb Sutter.
IIUC, this only works if your NAT has certain ALGs enabled.
Is there a straightforward way to disable the problematic ALGs? I suppose it varies by what router you’re using. I have an Eero system; its firmware is up to date, but the release history doesn’t mention any fixes for something like this.
Is there any easy way to test if a NAT has this enabled or not? Many consumer routers provided by ISPs don’t offer many configuration options.
Hello, I am here to derail the Rust discussion before it gets started. The culprit behind sudo’s vast repertoire of vulnerabilities, and more broadly of bugs in general, is accountable almost entirely to one matter: its runaway complexity.
We have another tool which does something very similar to sudo which we can compare with: doas. The portable version clocks in at about 500 lines of code, its man pages are a combined 157 lines long, and it has had two CVEs (only one of which Rust would have prevented), or approximately one every 30 months.
sudo is about 120,000 lines of code (100x more), its had 140 CVEs, or about one every 2 months since the CVE database came into being 21 years ago. Its man pages are about 10,000 lines and include the following:
If you want programs to be more secure, stable, and reliable, the key metric to address is complexity. Rewriting it in Rust is not the main concern.
Did you even look at that list? Most of those are not sudo vulnerabilities but issues in sudo configurations distros ship with. The actual list is more like 39, and a number of them are “disputed” and most are low-impact. I didn’t do a full detailed analysis of the issues, but the implication that it’s had “140 security problems” is simply false.
More like 60k if you exclude the regress (tests) and lib directories, and 15k if you exclude the plugins (although the sudoers plugin is 40k lines, which most people use). Either way, it’s at least half of 120k.
12k, but this also includes various technical documentation (like the plugin API); the main documentation in
sudoers(1)
is 741 lines, andsudoers(5)
is 3,255 lines. Well under half of 10,000.Except that it only has 10% of the features, or less. This is good if you don’t use them, and bad if you do. But I already commented on this at HN so no need to repeat that here.
You’re right about these numbers being a back-of-the-napkin analysis. But even your more detailed analysis shows that the situation is much graver with sudo. I am going to include plugins, becuase if they ship, they’re a liability. And their docs, because they felt the need to write them. You can’t just shove the complexity you don’t use and/or like under the rug. Heartbleed brought the internet to its knees because of a vulnerability in a feature no one uses.
And yes, doas has 10% of the features by count - but it has 99% of the features by utility. If you need something in the 1%, what right do you have to shove it into my system? Go make your own tool! Your little feature which is incredibly useful to you is incredibly non-useful to everyone else, which means fewer eyes on it, and it’s a security liability to 99% of systems as such. Not every feature idea is meritous. Scope management is important.
Citation needed.
Nobody is shoving anything into your system. The sudo maintainers have the right to decide to include features, and they’ve been exercising that right. You have the right to skip sudo and write your own - and you’ve been exercising that right too.
You’re asking people to undergo the burden of forking or re-writing all of the common functionality of an existing tool just so they can add their one feature. This imposes a great cost on them. Meanwhile, including that code or feature into an existing tool imposes only a small (or much smaller) cost, if done correctly - the incremental cost of adding a new feature to an existing system.
The key phrase here is “if done correctly”. The consensus seems to be that sudo is suffering from poor engineering practices - few or no tests, including with the patch that (ostensibly) fixes this bug. If your software engineering practices are bad, then simpler programs will have fewer bugs only because there’s less code to have bugs in. This is not a virtue. Large, complex programs can be built to be (relatively) safe by employing tests, memory checkers, good design practices, good architecture (which also reduces accidental complexity) code reviews, and technologies that help mitigate errors (whether that be a memory-safe GC-less language like Rust or a memory-safe GC’ed language like Python). Most features can (and should) be partitioned off from the rest of the design, either through compile-time flags or runtime architecture, which prevents them from incurring security or performance penalties.
Software is meant to serve the needs of users. Users have varied use-cases. Distinct use-cases require more code to implement, and thereby incur complexity (although, depending on how good of an engineer one is, additional accidental complexity above the base essential complexity may be added). If you want to serve the majority of your users, you must incur some complexity. If you want to still serve them, then start by removing the accidental complexity. If you want to remove the essential complexity, then you are no longer serving your users.
The sudo project is probably designed to serve the needs of the vast majority of the Linux user-base, and it succeeds at that, for the most part. doas very intentionally does not serve the needs of the vast majority of the linux user-base. Don’t condemn a project for trying to serve more users than you are.
Serving users is meritous - or do you disagree?
Yes, but the difference is that these are features people actually use, which wasn’t the case with Heartleed. Like I mentioned, I think doas is great – I’ve been using it for years and never really used (or liked) sudo because I felt it was far too complex for my needs, before doas I just used
su
. But I can’t deny that for a lot of other people (mainly organisations, which is the biggest use-case for sudo in the first place) these features are actually useful.A lot of these things aren’t “little” features, and many interact with other features. What if I want doas + 3 flags from sudo + LDAP + auditing? There are many combinations possible, and writing a separate tool for every one of them isn’t really realistic, and all of this also required maintenance and reliable consistent long-term maintainers are kind of rare.
Yes, I’m usually pretty explicit about which use cases I want to solve and which I don’t want to solve. But “solving all the use cases” is also a valid scope. Is this a trade-off? Sure. But everything here is.
The real problem isn’t so much sudo; but rather that sudo is the de-facto default in almost all Linux distros (often installed by default, too). Ideally, the default should be the simplest tool which solves most of the common use cases (i.e. doas), and people with more complex use cases can install sudo if they need it. I don’t know why there aren’t more distros using doas by default (probably just inertia?)
Tough shit? I want a pony, and a tuba, and barbie doll…
My entire thesis is that it’s not a valid scope. This fallacy leads to severe and present problems like the one we’re discussing today. You’re begging the question here.
This is an extremely user-hostile attitude to have (and don’t try claiming that telling users with not-even-very-obscure use-cases to write their own tools isn’t user-hostile).
I’ve noticed that some programmers are engineers that try to build tools to solve problems for users, and some are artists that build programs that are beautiful or clever, or just because they can. You appear to be one of the latter, with your goal being crafting simple, beautiful systems. This is fine. However, this is not the mindset that allows you to build either successful systems (in a marketshare sense) or ones that are useful for many people other than yourself, for previously-discussed reasons. The sudo maintainers are trying to build software for people to use. Sure, there’s more than one way to do that (integration vs composition), but there are ways to do both poorly, and claiming the moral high ground for choosing simplicity (composition) is not only poor form but also kind of bad optics when you haven’t even begun to demonstrate that it’s a better design strategy.
A thesis which you have not adequately defended. Your statements have amounted to “This bug is due to sudo’s complexity which is driven by the target scope/number of features that it has”, while both failing to provide any substantial evidence that this is the case (e.g. showing that sudo’s bugs are due to feature-driven essential complexity alone, and not use of a memory-unsafe language, poor software engineering practices (which could lead to either accidental complexity or directly to bugs themselves), or simple chance/statistics) and not actually providing any defense for the thesis as stated. Assume that @arp242 didn’t mean “all” the usecases, but instead “the vast majority” of them - say, enough that it works for 99.9% of users. Why is this “invalid”, exactly? It’s easy for me to imagine the argument being “this is a bad idea”, but I can’t imagine why you would think that it’s logically incoherent.
Finally, you have repeatedly conflated “complexity” and “features”. Your entire argument is, again, invalid if you can’t show that sudo’s complexity is purely (or even mostly) essential complexity, as opposed to accidental complexity coming from being careless etc.
I dont’t think “users (distros) make a lot of configuration mistakes” is a good defence when arguing if complexity is the issue.
But I do agree about feature set. And I feel like arguing against complexity for safety is wrong (like ddevault was doing), because systems inevitably grow complex. We should still be able to build safe, complex systems. (Hence why I’m a proponent of language innovation and ditching C.)
It’s silly stuff like
(ALL : ALL) NOPASSWD: ALL
. “Can run sudo without a password” seems like a common theme: some shell injection is found in the web UI and because the config is really naïve (which is definitely not the sudo default) it’s escalated to root.Others aren’t directly related to sudo configuration as such; for example this one has a Perl script which is run with sudo that can be exploited to run arbitrary shell commands. This is also a common theme: some script is run with sudo, but the script has some vulnerability and is now escalated to root as it’s run with sudo.
I didn’t check all of the issues, but almost all that I checked are one of the above; I don’t really see any where the vulnerability is caused directly by the complexity of sudo or its configuration; it’s just that running anything as root is tricky: setuid returns 432 results, three times that of sudo, and I don’t think that anyone can argue that setuid is complex or that setuid implementations have been riddled with security bugs.
Other just mention sudo in passing by the way; this one is really about an unrelated remote exec vulnerability, and just mentions “If QCMAP_CLI can be run via sudo or setuid, this also allows elevating privileges to root”. And this one isn’t even about sudo at all, but about a “sudo mode” plugin for TYPO3, presumably to allow TYPO3 users some admin capabilities without giving away the admin password. And who knows why this one is even returned in a search for “sudo” as it’s not mentioned anywhere.
This is comparing apples to oranges. setuid affects many programs, so obviously it would have more results than a single program would. If you’re going to attack my numbers than at least run the same logic over your own.
It is comparing apples to apples, because many of the CVEs are about other program’s improper sudo usage, similar to improper/insecure setuid usage.
Well, whatever we’re comparing, it’s not making much sense.
But, if you’re trying to bring this back and compare it with my 140 CVE number, it’s still pretty damning for sudo. setuid is an essential and basic feature of Unix, which cannot be made any smaller than it already is without sacrificing its essential nature. It’s required for thousands of programs to carry out their basic premise, including both sudo and doas! sudo, on the other hand, can be made much simpler and still address its most common use-cases, as demonstrated by doas’s evident utility. It also has a much smaller exposure: one non-standard tool written in the 80’s and shunted along the timeline of Unix history every since, compared to a standardized Unix feature introduced by DMR himself in the early 70’s. And setuid somehow has only 4x the number of footgun incidents? sudo could do a hell of a lot better, and it can do so by trimming the fat - a lot of it.
It’s not because it’s hard to use, it’s just that its usage can escalate other more (relatively) benign security problems, just like setuid can. This is my point, as a reply to stephank’s comment. This is inherent to running anything as root, with setuid, sudo, or doas, and why we have capabilities on Linux now. I bet that if doas would be the default instead of sudo we’d have a bunch of CVEs about improper doas usage now, because people do stupid things like allowing anyone to run anything without password and then write a shitty web UI in front of that. That particular problem is not doas’s (or sudo’s) fault, just as cutting myself with the kitchen knife isn’t the knife’s fault.
Yes, sudo has had more issues in total; I never said it doesn’t. It’s just a lot lower than what you said, and quite a number are very low-impact, so I just disputed the implication that sudo is a security nightmare waiting to happen: it’s track record isn’t all that bad. As always, more features come with more (security) bugs, but use cases do need solving somehow. As I mentioned, it’s a trade-off.
We already agreed on this yesterday on HN, which I repeated here as well; all I’m adding is “but sudo is still useful, as it solves many more use cases” and “sudo isn’t that bad”.
Interesting thing to note: sudo was removed from OpenBSD by [email protected]; who is also the sudo maintainer. I think he’ll agree that “sudo is too complex for it to the default”, which we already agree on, but not that sudo is “too complex to exist”, which is where we don’t agree.
Could sudo be simpler or better architectured to contain its complexity? Maybe. I haven’t looked at the source or use cases in-depth, and I’m not really qualified to make this judgement.
I think arguing against complexity is one of the core principles of UNIX philosophy, and it’s gotten us quite far on the operating system front.
If simplicity was used in
sudo
, this particular vulnerability would not have been possible to trigger it: why havesudoedit
in the first place, which just implies the-e
flag? This statement is a guarantee.If it would’ve ditched C, there is no guarantee that this issue wouldn’t have happened.
If even the distros can’t understand the configuration well enough to get it right, what hope do I have?
OK maybe here’s a more specific discussion point:
There can be logic bugs in basically any language, of course. However, the following classes of bugs tend to be steps in major exploits:
It is hard to deny that, in a universe where nobody ever messed up those two points, there are a lot less nasty exploits in the world in systems software in particular.
Many other toolchains have decided to make the above two issues almost non-existent through various techniques. A bunch of old C code doesn’t handle this. Is there not something that can be done here to get the same productivity and safety advantages found in almost every other toolchain for tools that form the foundation of operating computers? Including a new C standard or something?
I can have a bunch of spaghetti code in Python, but turning that spaghetti into “oh wow
argv
contents ran over some other variables and messed up the internal state machine” is a uniquely C problem, but if everyone else can find solutions, I feel like C could as well (including introducing new mechanisms to the language. We are not bound by what is printed in some 40-year-old books, and #ifdef is a thing).EDIT: forgot to mention this, I do think that
sudo
is a bit special given that its default job is to takeargv
contents and run them. I kinda agree thatsudo
is a bit special in terms of exploitability. But hey, the logic bugs by themselves weren’t enough to trigger the bug. When you have a multi-step exploit, anything on the path getting stopped is sufficient, right?+1. Lost in the noise of “but not all CVEs…” is the simple fact that this CVE comes from an embarrassing C string fuckup that would be impossible, or at least caught by static analysis, or at very least caught at runtime, in most other languages. If “RWIIR” is flame bait, then how about “RWIIP” or at least “RWIIC++”?
I be confused… what does the P in RWIIP mean?
Pascal?
Python? Perl? Prolog? PL/I?
Probably Python, given the content of the comment by @rtpg. Python is also memory-safe, while it’s unclear to me whether Pascal is (a quick search reveals that at least FreePascal is not memory-safe).
Were it not for the relative (accidental, non-feature-providing) complexity of Python to C, I would support RWIIP. Perhaps Lua would be a better choice - it has a tiny memory and disk footprint while also being memory-safe.
That’s possibly it.
Not to mention that Lua – even when used without LuaJIT – is simply blazingly fast compared to other scripting languages (Python, Perl, &c)!
For instance, see this benchmark I did sometime ago: https://0x0.st/--3s.txt. I had implemented Ackermann’s function in various languages (the “./ack” file is the one in C) to get a rough idea on their execution speed, and lo and behold Lua turned out to be second only to the C implementation.
I agree that rewriting things in Rust is not always the answer, and I also agree that simpler software makes for more secure software. However, I think it is disingenuous to compare the overall CVE count for the two programs. Would you agree that
sudo
is much more widely installed thandoas
(and therefore is a larger target for security researchers)? Additionally, most of the 140 CVEs linked were filed before October 2015, which is whendoas
was released. Finally, some of the linked CVEs aren’t even related to code vulnerabilities in sudo, such as the six Quest DR Series Disk Backup CVEs (example).I would agree that sudo has a bigger target painted on its back, but it’s also important to acknowledge that it has a much bigger back - 100× bigger. However, I think the comparison is fair. doas is the default in OpenBSD and very common in NetBSD and FreeBSD systems as well, which are at the heart of a lot of high-value operations. I think it’s over the threshold where we can consider it a high-value target for exploitation. We can also consider the kinds of vulnerabilities which have occured internally within each project, without comparing their quantity to one another, to characterize the sorts of vulnerabilities which are common to each project, and ascertain something interesting while still accounting for differences in prominence. Finally, there’s also a bias in the other direction: doas is a much simpler tool, shipped by a team famed for its security prowess. Might this not dissuade it as a target for security researchers just as much?
Bonus: if for some reason we believed that doas was likely to be vulnerable, we could conduct a thorough audit on its 500-some lines of code in an hour or two. What would the same process look like for sudo?
What?
So you’re saying that 50% of the CVEs in doas would have been prevented by writing it in Rust? Seems like a good reason to write it in Rust.
Another missing point is that Rust is only one of many memory safe languages. Sudo doesn’t need to be particularly performant or free of garbage collection pauses. It could be written in your favorite GCed language like Go, Java, Scheme, Haskell, etc. Literally any memory safe language would be better than C for something security-critical like
sudo
, whether we are trying to build a featureful complex version likesudo
or a simpler one likedoas
.Indeed. And you know, Unix in some ways have been doing this for years anyway with Perl, python and shell scripts.
I’m not a security expert, so I’m be happy to be corrected, but if I remember correctly, using secrets safely in a garbage collected language is not trivial. Once you’ve finished working with some secret, you don’t necessarily know how long it will remain in memory before it’s garbage collected, or whether it will be securely deleted or just ‘deallocated’ and left in RAM for the next program to read. There are ways around this, such as falling back to manual memory control for sensitive data, but as I say, it’s not trivial.
That is true, but you could also do the secrets handling in a small library written in C or Rust and FFI with that, while the rest of your bog-standard logic not beholden to the issues that habitually plague every non-trivial C codebase.
Agreed.
Besides these capabilities, ideally a language would also have ways of expressing important security properties of code. For example, ways to specify that a certain piece of data is secret and ensure that it can’t escape and is properly overwritten when going out of scope instead of simply being dropped, and ways to specify a requirement for certain code to use constant time to prevent timing side channels. Some languages are starting to include things like these.
Meanwhile when you try to write code with these invariants in, say, C, the compiler might optimize these desired constraints away (overwriting secrets is a dead store that can be eliminated, the password checker can abort early when the Nth character of the hash is wrong, etc) because there is no way to actually express those invariants in the language. So I understand that some of these security-critical things are written in inline assembly to prevent these problems.
I believe that
explicit_bzero(3)
largely solves this particular issue in C.Ah, yes, thanks!
It looks like it was added to glibc in 2017. I’m not sure if I haven’t looked at this since then, if the resources I was reading were just not up to date, or if I just forgot about this function.
I do think high complexity is the source of many problems in
sudo
and thatdoas
is a great alternative to avoid many of those issues.I also think
sudo
will continue being used by many people regardless. If somebody is willing to write an implementation in Rust which might be just as complex but ensures some level of safety, I don’t see why that wouldn’t be an appropriate solution to reducing the attack surface. I certainly don’t see why we should avoid discussing Rust just because an alternative tosudo
exists.Talking about Rust as an alternative is missing the forest for the memes. Rust is a viral language (in the sense of internet virality), and a brain worm that makes us all want to talk about it. But in actual fact, C is not the main reason why anything is broken - complexity is. We could get much more robust and reliable software if we focused on complexity, but instead everyone wants to talk about fucking Rust. Rust has its own share of problems, chief among them its astronomical complexity. Rust is not a moral imperative, and not even the best way of solving these problems, but it does have a viral meme status which means that anyone who sees through its bullshit has to proactively fend off the mob.
Offering opinions as facts. The irony of going on to talk about seeing through bullshit.
I don’t understand why you hate Rust so much but it seems as irrational as people’s love for it. Rust’s main value proposition is that it allows you to write more complex software that has fewer bugs, and your point is that this is irrelevant because the software should just be less complex. Well I have news for you, software is not going to lose any of its complexity. That’s because we want software to do stuff, the less stuff it does the less useful it becomes, or you have to replace one tool with two tools. The ecosystem hasn’t actually become less complex when you do that, you’re just dividing the code base into two chunks that don’t really do what you want. I don’t know why you hate Rust so much to warrant posting anywhere the discussion might come up, but I would suggest if you truly cannot stand it that you use some of your non-complex software to filter out related keywords in your web browser.
Agree with what you’ve wrote, but just to pick at a theme that’s bothering me on this thread…
This is obviously very subjective, and everything below is anecdotal, but I don’t agree with this equivalence.
In my own experience, everyone I’ve met who “loves” or is at least excited about rust seems to feel so for pretty rational reasons: they find the tech interesting (borrow checking, safety, ML-inspired type system), or they enjoy the community (excellent documentation, lots of development, lots of online community). Or maybe it’s their first foray into open source, and they find that gratifying for a number of reasons. I’ve learned from some of these people, and appreciate the passion for what they’re doing. Not to say they don’t exist, but I haven’t really seen anyone “irrationally” enjoy rust - what would that mean? I’ve seen floating around a certain spiteful narrative of the rust developer as some sort of zealous online persona that engages in magical thinking around the things rust can do for them, but I haven’t really seen this type of less-than-critical advocacy any more for rust than I have seen for other technologies.
On the other hand I’ve definitely seen solid critiques of rust in terms of certain algorithms being tricky to express within the constraints of the borrow checker, and I’ve also seen solid pushback against some of the guarantees that didn’t hold up in specific cases, and to me that all obviously falls well within the bounds of “rational”. But I do see a fair amount of emotionally charged language leveled against not just rust (i.e. “bullshit” above) but the rust community as well (“the mob”), and I don’t understand what that’s aiming to accomplish.
I agree with you, and I apologize if it came across that I think rust lovers are irrational - I for one am a huge rust proselytizer. I intended for the irrationality I mentioned to be the perceived irrationality DD attributes to the rust community
Definitely no apology needed, and to be clear I think the rust bashing was coming from elsewhere, I just felt like calling it to light on a less charged comment.
I think the criticism isn’t so much that people are irrational in their fondness of Rust, but rather that there are some people who are overly zealous in their proselytizing, as well as a certain disdain for everyone who is not yet using Rust.
Here’s an example comment from the HN thread on this:
There are a lot of things to say about this comment, and it’s entirely false IMO, but it’s not exactly a nice comment, and why Rust? Why not Go? Or Python? Or Zig? Or something else.
Here’s another one:
The disdain is palatable in this one, and “Rust is modernized C” really misses the mark IMO; Rust has a vastly different approach. You can consider this a good or bad thing, but it’s really not the only approach towards memory-safe programming languages.
Of course this is not representative for the entire community; there are plenty of Rust people that I like and have considerably more nuanced views – which are also expressed in that HN thread – but these comments certainly are frequent enough to give a somewhat unpleasant taste.
While I don’t approve of the deliberately inflammatory form of the comments, and don’t agree with the general statement that all complexity is eliminateable, I personally agree that, in this particular case, simplicity > Rust.
As a thought experiment, world 1 uses sudo-rs as a default implementation of sudo, while world 2 uses 500 lines of C which is doas. I do think that world 2 would be generally more secure. Sure, it’ll have more segfaults, but fewer logical bugs.
I also think that the vast majority of world 2 populace wouldn’t notice the absence of advanced sudo features. To be clear, the small fraction that needs those features would have to install sudo, and they’ll use the less tested implementation, so they will be less secure. But that would be more than offset by improved security of all the rest.
Adding a feature to a program always has a cost for those who don’t use this feature. If the feature is obscure, it might be overall more beneficial to have a simple version which is used by the 90% of the people, and a complex for the rest 10%. The 10% would be significantly worse off in comparison to the unified program. The 90% would be slightly better off. But 90% >> 10%.
I argue that it’s actually that it allows you to write fast software with fewer bugs. I’m not entirely convinced that Rust allows you to manage complexity better than, say, Common Lisp.
Exactly. Software is written for people to use. (technically, only some software - other software (such as demoscenes) is written for the beauty of it, or the enjoyment of the programmer; but in this discussion we only care about the former)
Even worse - it becomes more complex. Now that you have two tools, you have two userbases, two websites, two source repositories, two APIs, two sets of file formats, two packages, and more. If the designs of the tools begin to differ substantially, you have significantly more ecosystem complexity.
You’re right about Rust value proposition, I should have added performance to that sentence. Or, I should have just said managed language, because as another commenter pointed out Rust is almost irrelevant to this whole conversation when it comes to preventing these type of CVEs
The other issue is that it is a huge violation of principle of least privilege. Those other features are fine, but do they really need to be running as root?
Just to add to that: In addition to having already far too much complexity, it seems the sudo developers have a tendency to add even more features: https://computingforgeeks.com/better-secure-new-sudo-release/
Plugins, integrated log server, TLS support… none of that are things I’d want in a tool that should be simple and is installed as suid root.
(Though I don’t think complexity vs. memory safety are necessarily opposed solutions. You could easily imagine a sudo-alike too that is written in rust and does not come with unnecessary complexity.)
What’s wrong with EBNF and how is it related to security? I guess you think EBNF is something the user shouldn’t need to concern themselves with?
There’s nothing wrong with EBNF, but there is something wrong with relying on it to explain an end-user-facing domain-specific configuration file format for a single application. It speaks to the greater underlying complexity, which is the point I’m making here. Also, if you ever have to warn your users not to despair when reading your docs, you should probably course correct instead.
Rewrite: The point that you made in your original comment is that sudo has too many features (disguising it as a point about complexity). The manpage snippet that you’re referring to has nothing to do with features - it’s a mix between (1) the manpage being written poorly and (2) a bad choice of configuration file format resulting in accidental complexity increase (with no additional features added).
EBNF as a concept aside; the sudoers manpage is terrible.
I am not sure what you are trying to say, let me guess with runaway complexity.
Something else maybe?
Technically I agree with both, though my arguments for the former are most decidedly off-topic.
Taking Drew’s statement at face value: There’s about to be another protracted, pointless argument about rewriting things in rust, and he’d prefer to talk about something more practically useful?
I don’t understand why you would care about preventing a protracted, pointless argument on the internet. Seems to me like trying to nail jello to a tree.
This is a great opportunity to promote doas. I use it everywhere these days, and though I don’t consider myself any sort of Unix philosophy purist, it’s a good example of “do one thing well”. I’ll call out Ted Unangst for making great software. Another example is signify. Compared to other signing solutions, there is much less complexity, much less attack surface, and a far shallower learning curve.
I’m also a fan of tinyssh. It has almost no knobs to twiddle, making it hard to misconfigure. This is what I want in security-critical software.
Relevant link: Features Are Faults.
All of the above is orthogonal to choice of implementation language. You might have gotten a better response in the thread by praising doas and leaving iron oxide out of the discussion. ‘Tis better to draw flies with honey than with vinegar. Instead, you stirred up the hornets’ nest by preemptively attacking Rust.
PS. I’m a fan of your work, especially Sourcehut. I’m not starting from a place of hostility.
Why can’t we have the best of both worlds? Essentially a program copying the simplicity of doas, but written in Rust.
Note that both sudo and doas originated in OpenBSD. :)
Got a source for the former? I’m pretty sure sudo well pre-dates OpenBSD.
The current maintainer is also an OpenBSD contributor, but he started maintaining sudo in the early 90s, before OpenBSD forked from NetBSD. I don’t know when he started contributing to OpenBSD.
So I don’t think it’s fair to say that sudo originated in OpenBSD :)
Ah, looks like I was incorrect. I misinterpreted OpenBSD’s innovations page. Thanks for the clarification!
I’d be sold, if the published part was static.
Well, sqlite is almost as static as your journal filesystem.
My static webserver disagrees.
you trade that for having all of the data being in sqlite.
I think the design choice is sqlite can store tons of small files in a single file with FTS.
yes, you get full text search now, but sqlite has loads of benefits over plain files.
Why does this have to be a trade?
The CMS and the resulting website do not need to both be dynamic and stored in sqlite.
I’d personally prefer running the CMS somewhere private, and hosting the (static) website just about anywhere.
Then go use something like Hugo, etc. This lets you edit live, that’s a non-trivial task when the files are 100% static. There is a trade-off for this.
Personally I just use Fossil-scm for websites anymore. I can host files, edit locally or remotely (live or not), sync changes, etc. It also is just a sqlite file.
There doesn’t have to be. There’s no need for the CMS to actually dynamically render the resulting website for the public.
An export button that creates a static website would be good enough. An automatically updated export would be excellent.
I can’t imagine go + sqlite not being fast enough for almost all websites in the world that would ever use something like this just doing the rendering every page load.
I haven’t looked at how this is coded, but if that is required for load reasons, then a cache would probably be a better solution.
But you are right, one could store in sqlite the rendered version (or even out on temp disk), for example.
But you are correct, technically. sure. I don’t see any reason to bother with the added complexity.
Just turning on your nginx caching option(or other front-end handling the TLS cert) would almost certainly be 99% easier and achieve basically the same effect.
Static is simpler.
Again, static is simpler.
Now that’s insane. A cache frontend for a backend that’s dynamically-rendering static content.
Instead of just serving the static content directly.
Than serving static content? You can’t be serious.
yes, static content is simpler, but is it simpler for this project, the way it’s implemented currently? I’d argue no.
You have to render the content at some point. You have effectively 2 choices, you can render @ save time, and create , effectively, a static site, or you can render it at load time.
You seem to think rendering at save time is the better choice, Then you would save both copies, the raw MD file and the rendered HTML, put them both in your sqlite DB. Then at HTTP GET time, you can simply stream from the sqlite file the rendered version. (alternatively you could store the rendered content out in some file somewhere I guess… complicating the code even further. sqlite is often faster than open() anyway, so I’d argue that point also..
The problem is, it’s easy to have cache and sync issues. If you do it at save time, and there is exactly 1 way to edit, then the cache and sync issues basically go away.. but there is more than 1 way to edit, you can edit using any thing that can speak sqlite or calling sqlite CLI directly.. or using the web interface. The big feature of this code base is the ‘live edit’ feature of the CMS, so one could punt the problem, and save 2 copies in sqlite, the raw MD and the rendered version, and if you are so inclined to do it outside of the live edit session, then it’s your problem to update the cache also.
Alternatively, do it at read time(HTTP GET), and save yourself the headache of cache and sync issues. This is the simpler version. Seriously, it is. It was one sentence, vs a paragraph for rendering @ save time.
Complication matters. Static seems simpler, but is it really?? not always.
Sure, but reading from sqlite is dynamic. Whereas any static webserver can serve a plain file. I prefer static webservers, as they are the simplest. This means low LoC count, which means easy to understand/audit/sandbox.
Specifically, I use openbsd’s httpd, and I would like to eventually move away from UNIX for my for public services whenever possible (e.g to a component system on top of seL4). A static website served from plain files is best.
Changing the goalposts again, that’s fine I can meet your goalpost here too :)
Reading from a static sqlite file isn’t any more dynamic than open() on a file. They are both reading effectively static content in the above scenario of rendered content.
I agree from a security point of view, something like seL4 will be more secure, for some definitions of secure.. but at some point we are just messing around and not actually solving problems.
What is the security risks you are trying to protect from? Without identifying the risks, it’s very hard to mitigate them. Otherwise we are just playing security theatre games, for zero benefit.
What’s the worst case scenario here, that someone manages to get into a static content web server? Given proper permissions.. nothing. if they get in, rootkit and get write access, the situation is worse, but again, given static content the likes of a personal project, the consequences are equally trivial I imagine.
Anyways, you didn’t refute any of my statements above, so I’m glad we finally agree, static is not always better, or even simpler.
Like I mentioned way up thread, I like live-edit and I’m very lazy. I just use Fossil-scm for my websites. They are technically dynamic now, but it’s amazingly easy to make them go, I even get a built in forum and bug-tracking for feedback, email notifications, etc. I get 100% revision and history control and is audit-able, off-line sync capable, live edit capable, etc. Plus deployment is super easy, a single binary that does everything and backups are equally trivial as it’s a single sqlite file. Because of offline capabilities, I generally have a few copies laying about anyways and it’s all cross-platform.
My webserver doesn’t support serving from static sqlite files. Dynamic as in, I’d have to run cgi code in addition to my webserver.
Me too, thus I’d love a CMS. It’s just, while dynamic is good for me (the one writing articles), it is unnecessary for viewers. I currently use a static site generator, which takes Markdown as input.
I do not wish to change the setup of my public site, which is a static site.
On my (public) personal sites, I simply want to minimize complexity, which should mitigate a broad range of security risks. It’s a win/win strategy.
Your webserver could support serving from sqlite files, if it so chose. that’s basically all rwtxt is doing.
Anyways, I feel like we aren’t having a productive conversation anymore. I already covered your supposed complexity in a previous comment.
edit: also in this thread, but in a different comment chain, I commented on security analysis, which also applies here.
you could spider the dynamic version of the site quickly with httrack or something to produce a static version.
Just turning on your nginx caching option(or other front-end handling the TLS cert) would almost certainly be 99% easier and achieve basically the same effect.
In terms of the security analysis it’s a completely different thing to have a dynamic application running exposed to the internet, even if you cache it.
OK, I think I get your point(that in security complexity hurts you), but I think we have very different understandings of security analysis, so I’m going to write some stuff here.
You can’t talk about security mitigation’s without talking about specific risks you are trying to eradicate.
Nginx is dynamic. openbsd’s HTTPD is dynamic. any “static webserver” is dynamically taking inputs (HTTP GET requests) and mapping them to files out on a filesystem. Nothing is stopping nginx from serving /etc/shadow or /home/rain1/mysupersecretthinghere. Except some configuration(and hopefully) some file permissions.
This is no different than program X taking an HTTP get, opening a sqlite file and serving the results out of a column. It’s totally doable and equally “dynamic” for the most part.
I think what you are trying to say is, if rwtxt(since we are in the rwtxt thread) happens to be compromised(and I get complete control of it’s memory) I can convince rwtxt to render to you whatever I want.
Except the same is true of any other webserver. If I compromise nginx serving files from a filesystem(in the same way above), I can also have it render to you whatever I want.
There is basically no difference from a security analysis point of view. between rwtxt and a sqlite file and nginx and a html file. Both files can be read only from the web server perspective; of course then rwtxt’s live edit will not work, but hey, we are trying to be SECURE DAMMNIT! lol.
The difference here, from a security analysis perspective is, nginx is way way way more popular than rwtxt(today, who knows about tomorrow), so the chances of finding a complete compromise of nginx is, one hopes, much much harder than rwtxt, a tiny project mostly(completely? – didn’t look) written by one person. Of course the opposite is also true, way more bad people are looking at how to compromise nginx, then rwtxt, there is something to be said for security through obscurity, in a vague sort of hand-wavy way… as long as you are not an active target of bad people.
Hence why we go back to: You can’t talk about practical security mitigation’s without talking about specific risks you are trying to eradicate.
So mostly your sentence makes no sense from a security analysis perspective.
OK soap box over.
You may be able to get most of the benefits of static publishing by putting a CDN/caching layer in front of the CMS.
That’d make hosting the actual website more complicated, rather than easier. Adding layers isn’t the solution.
I do not wish to expose the CMS to the general public. The CMS is relevant to webmasters (authors) only.
I just need an export button I can press anytime to get a static site.
Why is that important for you?
Alternatively, Learn X in Y Minutes has a similar guide for Rust and other programming languages.
Please, please, please don’t comment code as per the “good” code example. In my opinion, comments should explain the “why” or other non obvious design decisions instead of merely reiterating what the code is doing.
Yeah, the example is uncompelling and I would not look kindly on it in a review.
That said, project like GCC have comments of the “it does this” nature and they are immensely useful because it is usually not obvious what the code is, in fact, doing. The reasons for this are legion, but even something seemingly simple benefits from basic comments because you often end up jumping into this code from something that is essentially unrelated. Without those kinds of comments, you would end up spending an incredible amount of time getting to know the module (which is often very complicated) just to get what tends to be tangential, and important, information.
Nice savings! I’m a little confused as to the use of PubSub for the import/export process though. IIUC Elasticsearch already has existing methods to migrate indexes. Would it have been even cheaper to use these methods and cut the PubSub cost out of the migration process?