Threads for teymour

    1. 1

      If you’re using postgres I’m really curious to hear why a UUID would be faster than serial anyway.

      1. 7

        I don’t think the article said either version of UUID was faster than serial. It avoided discussing serial in detail because making serial IDs public can increase the system’s vulnerability to bad actors:

        UUIDv4 IDs have properties that make them suitable as public identifiers, unlike traditional auto-incremented integers.

        • They don’t disclose business information, e.g., number of orders so far.
        • They are not guessable: one cannot enumerate all identifiers to discover resources.
        1. 4

          It is guessable yes, but then again, if you restrict the data resources to the owners of those resources, guessing does not give any information away about resources that the user does not own.

          Enumeration then also doesn’t make sense (order 1000 out of how many?). If you rely on randomness for security that means you do security by obscurity, which is not optimal.

          You can share an auto-incremented integer by also supplying an acces token for that resource. If the access token is not provided, return “resource not found”.

          1. 5

            Enumeration then also doesn’t make sense (order 1000 out of how many?).

            Just place two orders, one day (or one hour or one week) apart. Observe how the order ID has changed during that time.

          2. 2

            Enumeration then also doesn’t make sense (order 1000 out of how many?). If you rely on randomness for security that means you do security by obscurity, which is not optimal.

            You can get pretty good estimates for how many based on a a series of observations.

            If you rely on randomness for security that means you do security by obscurity, which is not optimal.

            Not optimal when it comes to a computer system for keeping information secure. I would argue that here the number of x (where in the above example x=#orders) is the information to be kept secret rather than a part of the system?

            1. 1

              The German Tank problem is interesting! I wonder if production variability is accounted for. A software system with clients is not equivalent to a wartime production facility.

              The security part is in response to this part:

              They are not guessable: one cannot enumerate all identifiers to discover resources.

        2. 1

          Yeah I realize it went unaddressed, I’m just curious. It’s true that it leaks data and is guessable.

        3. 1

          IDORs has entered the chat xD

    2. 12

      I want to mention act, a terribly-named tool for running GH Actions locally prior to push. It does not always work, but sometimes it can save time when debugging Actions.

      1. 4

        There’s also gha-runner which Pernosco (the rr debugger developers) have developed and use to run GA workflows on AWS.

    3. 24

      I wish I had written this article. I have an article I’ve been working on that root causes this behavior as “Software Engineering is not Computer Science”. This is not an anti-intellectual position, but an acknowledgement that when we do things like design cars we don’t hire exclusively theoretical physicists even though we fully expect mechanical engineers to know and apply a large amount of physics.

      Also our industries cargo culting of certain CS101 keywords while disdaining things on the implementation side has directly lead to some ridiculously slow applications. Such as the database performance issues mentioned in the article.

      As an industry we’ve somehow assumed that people smart enough to understand CS concepts are smart enough to apply those concepts in new areas. Perhaps that’s statement is true, but the reality is all too often they simply don’t. And that’s how we end up with people who simultaneously understand b-trees well enough to implement them but don’t think to review their database’s documentation on “CREATE INDEX”.

      1. 13

        There’s a fun flip side to this phenomenon which is that you can often solve 3SAT or even the halting problem in practice. Obviously one can’t push this too far (and by solve the halting problem I obviously mean solve specific instances of whether a program will halt) but CS undergrads seem to assign mythical difficulties to concrete instances of these problems that are actually surmountable in the real world. There’s a reason 3SAT isn’t used as the basis of a cryptographic protocol.

        1. 12

          One thing I liked about my own undergrad (~20 yrs ago) is that the juxtaposition of two classes helped me avoid this particular pitfall. In Algorithms, SAT was a canonical example of a “hard” problem, and you proved other problems hard through reduction to SAT. But in Artificial Intelligence, SAT was a canonical example of a tractable-in-practice problem, and you could show other problems tractable through… reduction to SAT.

          Alas, symbolic planning is pretty out of favor in 2023, so undergrads are unlikely to run across that particular example in an intro AI class. Maybe a programming languages class introducing formal methods could serve that role.

        2. 2

          This is something that CS undergraduate education covers (using randomized algorithms, which can be used to solve many problems where as you say there are pathological cases, but they are only a small fraction of the sample space)?

          1. 5

            What CS undergrad covers and what CS undergrads remember & understand a year or two out of college are overlapping but distinct. Like who the hell remembers how the pumping lemma works without looking it up?

            1. 9

              A lot of computer science degrees seem to labour under the misapprehension that they are there to teach computer science. This is not what a university course is for. A degree is supposed to give you a guided tour of your ignorance such that you understand how little you know and how to learn the things that you later need.

              1. 3

                Sounds like a very PhD thing to say ;)

            2. 3

              The pumping lemma is a lot easier to remember when you realize it boils down to “all sufficiently-long paths on a finite directed graph contain a cycle.”

      2. 5

        The article had been in my drafts folder for two months now, so you had a chance to beat me to it ;) In any case, it’d be worthwhile to have a parallel take on it too!

        Joking aside, I agree with your response and like the “Software Engineering is not CS” claim. I think it’s spot on. Also, it’s probably true that people smart enough to understand CS concepts can learn the rest… but the question is whether they are actually put in a position where they can learn (proper reviews, mentoring, etc.) and whether they want to learn. I’ve seen too many cases of “two inexperienced people in a corner bouncing code reviews off of each other” where, before you know it, the code is already in prod causing issues. The support structures are rarely there :-/

      1. 2

        It amazes me how often his stuff comes up and once again it’s something I’ve not read but absolutely found interesting. Thanks for sharing that.

      2. 1

        Yes absolutely, writing is thinking … I also like this post - https://gregorygundersen.com/blog/2020/01/12/why-research-blog/

        Much of https://www.oilshell.org/ is thinking as well. I just spent 3 weeks writing 5 posts, and that greatly clarified the language design. Now so many things are unblocked and can be implemented.

        I don’t think there’s a shortcut – you may think you have an idea, but when you try to explain it to somebody, you will realize you don’t actually have the idea YET. Writing is thinking

        Similar: http://www.paulgraham.com/words.html

        Once you publish something, the convention is that whatever you wrote was what you thought before you wrote it. These were your ideas, and now you’ve expressed them. But you know this isn’t true. You know that putting your ideas into words changed them.

    4. 1

      property-based testing

      I wish I could property-based test, or fuzz-test, any of the code that I usually work with. As far as I know, it’s not possible. For example, how would you apply these techniques to an arbitrary HTTP handler?

      1. 4

        I think the key takeaway from the discussion so far is that this is possible but not easy: when you have complex situations you need to test, you need a lot of skill to generatively test. That’s a totally different skill from hand-testing, and it has a much steeper learning curve. That’s what makes it hard to get into generative testing.

      2. 2

        By “handler”, do you mean a library that implements HTTP? I’m no expert, but two approaches that come to mind for me are

        1. Throw random input at it and test that it doesn’t crash (with sanitizers, if available).

        2. Give the same input to it and to one or more other HTTP implementations and test that they understood it in the same way (or that yours did better, in cases in which the other implementation is known to be wrong).

        1. 1

          I think my use case reduces to an API which accepts an arbitrary type, and returns an arbitrary type, both of which are too complex to exercise exhaustively. Automated testing would need to provide me a way to produce valid values of the input type, somehow — throwing random input at it wouldn’t be useful, even with sanitizers.

          1. 4

            Automated testing would need to provide me a way to produce valid values of the input type, somehow

            Yeah, that’s the tricky bit. Ultimately, for non-trivial cases, this all boils down to “you need to come up with an ingenious way to generate random valid-ish inputs”. It is possible to do significantly better here than just throwing random bytes at the system, but this generally requires writing non-trivial amount of code.

            https://fitzgeraldnick.com/2020/08/24/writing-a-test-case-generator.html is a good example here.

            For another real-world example, here’s what we use at work:

            https://github.com/tigerbeetledb/tigerbeetle/blob/main/src/state_machine/workload.zig

            Even for a relatively simple domain model, that’s almost a 1k lines.

            1. 1

              That test case generator post is a goldmine. Thanks for sharing. CSmith is such a powerful tool, and it’s great to see a deep dive into how similar tools work.

          2. 2

            Here’s a post that talks about generating input data.

            It can be complex, for sure, but every property-based testing / fuzzing library has extensive API support for creating complex data.

            You can also create the data yourself non-randomly, one example is by using the category partition method.

            1. 1

              My point is that the code I generally want to fuzz doesn’t take input data, it takes input as typed values. So I basically need the fuzzer to generate values, not bytes.

              1. 4

                Here’s an example in Python for reference, though if you tell me what language you’re working in I could find an example in that language. Say we have an HTTP API for creating blog posts, the input request might look something like:

                @dataclass
                class CreateBlogPostRequest:
                    content: str
                    title: str
                    tags: t.List[str]
                

                We can generate values of CreateBlogPostRequest in a test like:

                @given(blog_req=gen.from_type(CreateBlogPostRequest))
                def test_blog_post_creation(blog_req):
                  resp = api.handle(blog_req)
                  assert_something_about(resp)
                

                This is using the Hypothesis property-based testing library for Python, but many languages have PBT libraries. In this case, the data is generated from the type definition.

                Check out property-based testing, not fuzzing. They’re similar, but fuzzing is more focused on generating random bytes whereas property-based testing is focused on creating properly typed values. What you’re describing is an everyday task with PBT.

              2. 2

                Fuzzers are moving towards “structure-aware” fuzzing where they can produce (and for some operate directly) values, not bytes. See for example

          3. 1

            Maybe I misunderstand what you say, but it seems your case is exactly covered by fuzzing templates. Current state-of-the-art fuzzers (AFL comes to mind) allow you to describe input templates, with bits set to specific values or ranges. You are not forced to generate fully random input.

            1. 1

              If you want to effectively fuzz an HTTP handler, you can’t treat the input as an opaque sequence of bits, even if you group those bits in one way or another via templates. The fuzzer needs to generate input data with at least a minimal semantic understanding of the actual request type(s) – randomizing not sequences of opaque bits or bytes on the wire, but rather valid values as defined by the relevant type(s).

              1. 3

                That’s precisely what AFL lets you do. You can give it a grammar and it will generate streams of tokens that match that grammar. I believe it can also give you things that almost match the grammar, so you can check that you handle errors.

                1. 1

                  Ah, OK, so this is an approach where the fuzzer is a separate process to the code under test. My intuition is that fuzzing is, ideally, part of the language-native testing apparatus.

                  1. 2

                    AFL has a bunch of modes for varying degrees of white-box testing. It also has an LLVM pass that, among other things, transforms branches that match on exact values into nested branches that math on bits, which helps it explore the coverage space more usefully. For something that implements an RPC interface, you can just ask the fuzzer to generate the RPC messages, with some hinting about the structure of the messages.

                  2. 1

                    No, AFL can run in many modes. The one I’ve seen most frequently is with the AFL entrypoint as a compilation target of the project: you have a parsing library, and you build a custom fuzzing binary by specifying where and how AFL should call your library.

                    The interface that I know of are at least in C and C++, I would be surprised if there aren’t other bindings.

                    It honestly seems like AFL fully covers the ideal fuzzer representation you have, you should check it out. It’s a quite clever piece of engineering.

    5. 1

      On this topic there is a nice video (https://www.youtube.com/watch?v=H4iNuufAe_8) in which one of the rr developers explains how rr can be used to find such bugs.

    6. 7

      FYI, they strip out Google Analytics, which feels a little weird.

      1. 46

        They actually just block all third party content from loading by telling the browser to not load it via content security policy. They do not modify what you upload to the site (afaik).

      2. 33

        It’s the only ethical choice for hosting providers.

      3. 11

        This is also mentioned in the documentation. I guess the target audience are not really using analytics.

    7. 1

      This book is more like work-in-progress book… right?

      1. 1

        Yes I don’t think it will ever be “finished” either as its subject matter is (to some extent) continuously evolving.

    8. 2
      • Introduction to the Analysis of Algorithms by Philippe Flajolet and Robert Sedgewick. It’s a fantastic book, with a lot of useful mathematical techniques.
      • Why Programs Fail by Andreas Zeller. Lots of good debugging theory and strategies.

      There’s also the sled reading list which is really good.

    9. 30

      Makes sense to me. Given that it’s a topic with interest that’s only about to grow.

      1. 3

        I’m all for a tag, but I suspect that normies aren’t going to stay on the fediverse. They’ll either go back to Twitter or some other corporate social media product. Fediverse just feels like people and bots shouting into the void with relatively little interaction/diaglogue. I’m sure you can curate your experience, but I don’t think most people want to go through the hassle. My suspicion is that the Fediverse enthusiasm will fade in a few months.

        1. 13

          Fediverse just feels like people and bots shouting into the void with relatively little interaction/diaglogue.

          I have the complete opposite experience. Maybe you are holding it wrong?

          My suspicion is that the Fediverse enthusiasm will fade in a few months.

          How is that relevant at all? We have tags for fortran or dragonflybsd which are niches of niches sure we can have one for fediverse.

          1. 2

            I have the complete opposite experience. Maybe you are holding it wrong?

            Maybe? I’ve tried it a lot on several different servers over the years, and tried to make it work, but there was rarely any interaction.

            How is that relevant at all? We have tags for fortran or dragonflybsd which are niches of niches sure we can have one for fediverse.

            My post literally opened with “I’m all for a tag”. 🙄 This bit was relevant because the parent claimed that fediverse was going to continue growing in popularity, and I was expressing that it’s unlikely to continue growing in popularity beyond the next month or two. You’re welcome to disagree, but I’m still on-topic.

        2. 4

          What do you mean by “normies”, if I might ask?

          1. 6

            I assumed “normal people”/Non-tech people.

          2. 4

            People who stick to the mainstream as it pertains to some dimension. In this case the dimension is social media platforms, but it could be politics or something else.

          3. 5

            It is pseudo elitist speak of people who define their identity via the obscure technologies they use.

            1. 0

              It’s also popular among racists and otherwise antisocial online communities.

              1. 8

                You know who else drinks water? Hitler.

                It’s a common phrase all over the Internet. I’m sure some racist somewhere has used it, but that doesn’t imply that it’s particularly affiliated with racists.

                1. 5

                  Indeed; I’ve also heard it in LGBT+ & neurodivergent communities a fair bit.

                  It more broadly suggests “yes, we’re different, and that’s not a bad thing (maybe even a good one).”

              2. 3

                I get that terms like “normie”, “muggle”, or “civilian” may be derogatory depending on context but I’m curious: what racist & antisocial groups are using that term, and what groups do they target with it?

                1. 6

                  I found this paper by googling “how to redpill normies”

                  Redpilling Normies: An Ethnography of Alt-Right 4chan Discourse.

                  Sounds like it should be a good entry point for your research.

                2. 2

                  It really got moving on 4chan. Now, some people will say that not everyone on 4chan is that way, but if someone made a racist joke at Thanksgiving dinner and you laughed, it’s both of you. I don’t break bread with those types, personally.

    10. 4

      As someone with a PhD in Philosophy, I read this when it came out and loved it, but as far as I know, it had no impact on the field.

      1. 6

        There’s a quote by Noam Chomsky which is something to the effect of … “When I presented my ideas to mathematicians, they evaluated me on those ideas. When I presented my ideas to social scientists, they wanted to know what credentials I had before listening to me”.

        Seems like this could be a similar issue. It does seem that many philosophers are more like the latter social scientists – they want to know WHO said something, not simply if it’s true or not.

        From my limited exposure, a lot of philosophy is really about the meaning of words, and who invented a term, and not what’s actually true. There are a lot of convoluted arguments that could be made a lot more precise with simpler language and empirical tests.

        For example, much of what I’ve read about “AI alignment” – which DOES seem to be done by “professional philosophers” with prestigious degrees – suffers greatly from this. They start with questionable premises, and then invent an elaborate framework of knowledge on top of it, which is almost certainly wrong.

        That is, the game is to build more and more rhetoric/provocation on top of this new set of terms, without considering whether the premises are correct. It’s better if what you’re saying isn’t falsifiable for a long time.

        e.g. the book “Superintelligence” is like this, and the funny thing is that he almost admits as much in the preface. Similarly with the “longtermism” thing. It reminds me of this funny quote

        My No. 1 worry is: what if we’re focussed on entirely the wrong things?” he said. “What if we’re just wrong? What if A.I. is just a distraction?

        from a long New Yorker article which I commented on here: https://news.ycombinator.com/item?id=32393700


        BTW I also remember reading this Aaronson article on a train over 10 years ago! I’m pretty sure I found it on Hacker News. I got about halfway through and thought it was brilliant. The rest was over my head.

        And I should probably add that my comments probably come off as a bit harsh, and apply mainly to certain things that have gotten a lot of attention, and that I’ve had a longstanding interest in. I did qualify it with “from my limited exposure” :-)

          1. 2

            Of note is that he is referring to political scientists, not philosophers, and Aaronson’s paper specifically targets philosophers closer to the natural sciences.

            1. 3

              Yes, but from what I understand the OP is saying something similar could be the case here (the same scenario, but with a different group - the invariant is the basis for rejecting the idea, not the group rejecting it).

              1. 1

                I can read that from the comment too… but it doesn’t really say “the majority of philosophers rejected Aaronson”, more that his paper didn’t garner much attention.

                Aaronson is semi-controversial but not as controversial as Chomsky ;)

          2. 1

            Thanks for digging it up, I should have done that :)

            It does seem like he is referring to political science, which puts it in a different light … Political arguments are notorious for “attacking the speaker, not the idea”

            But I do think it’s relevant regardless because it’s hard to bring knowledge in from different fields … and arguably this is more of a problem as time goes on, simply because there’s more knowledge and more people writing about it

        1. 4

          From my limited exposure, a lot of philosophy is really about the meaning of words, and who invented a term, and not what’s actually true.

          That’s the effect of the academy, AFAICT. It’s not like the olden days where Aquinas could believe in God and so did all his colleagues, and they debated which proofs of God’s existence worked or didn’t. Now you have one guy studying Aquinas, one guy studying Plato, one guy studying Derrida, etc. and they don’t agree on anything fundamental, so to come to a consensus, you just talk about the history of the ideas and not their actual truth. You do get some actual disputes, but it’s all on technical minutiae, like “is justification internal or external?” That sort of thing.

          There are a lot of convoluted arguments that could be made a lot more precise with simpler language and empirical tests.

          Hume said the same thing. He was wrong. :-)

          1. 2

            You do get some actual disputes, but it’s all on technical minutiae, like “is justification internal or external?” That sort of thing.

            I think that this may be a bad example—or maybe I misunderstand your point. (I realize that you have a PhD in philosophy, but I’m also writing for others. I don’t mean to be teaching you things you don’t know about, e.g., the justified true belief theory of knowledge.)

            A lot of people have thought that we can define knowledge as justified true belief. If we do that, we will need to understand what we mean by justified. Once we dig into that question, it is far from trivial whether justification is internal or external. I’m assuming we all agree that “What is knowlege?” is a real and significant question. Since we want to understand knowledge, and understanding justification seems required in order to understand one long-standing and influential account of knowledge, I think that whether justification is internal or external is a very real dispute and not merely technical minutiae. (As an additional point, I think that the arguments in favor of internalism and externalism themselves are interesting and well worth considering. There are strong reasons in favor of both, but both have problems. It’s a fascinating debate. YMMV, of course.)

            Here’s a different way of putting this. People inside and outside of academic philosophy often long for the good old days when philosophy wasn’t so technical or hyper-specialized. (Lots of longing for the good old days when Bertrand Russell wrote on logic, ethics, and epistemology.) But I wonder whether that nostalgia isn’t parallel to (a hypothetical) longing for the good old days of Newtonian physics—you know, when physics was more straightforward and not so hyper-specialized. I don’t think we can (or should want to) go backwards. Philosophy is far more complicated and specialized, and this can be frustrating, but I don’t think that’s primarily or only because of (say) pressures to publish or sloppy thinking. The questions are genuinely, deeply difficult.

            Bottom line: I am not convinced that contemporary philosophers have given up the pursuit of truth and replaced it with the history of ideas.

            1. 1

              The key thing is context. Trying to avoid it brings problems. Just talk to the people you are talking to and stop stressing the global truth of your statements. See, I did it just now, I made a statement without a disclaimer and I trust that you understand that it is mostly in alignment with what I believe, not a global truth, but now that I’ve added this disclaimer I’m back to trying to climb the hill of global truth-y-ness.

              Language is decentralized and ever changing. The western philosophy tradition is too obsessed with who was first or whatever instead of realizing that what we actually want is to pass the torch and teach the next generation our deeply held insights (without trapping them in minutiae and technical arguments, which can be important, but are also often a case of missing the forest for the trees).

            2. 1

              I agree broadly.

              I think the internalism/externalism debate qualifies as minutia in the sense that, it only makes sense as a question if you already accept the idea that knowledge is a justified true belief. We all agree that we want knowledge and it’s better to have knowledge than ignorance (or do we?), but only smaller number of people agree that knowledge is something like justified true belief. Among people who are debating JTB, there are Gettier problems and whatnot, and so they end up digging into well, what is justification anyway and how does that work? And then you can go further down and have various flavors of internalism/externalism and hybrid theories and controversies within them.

              It’s fair though for someone else to look at the justification debate from outside and say, well, that’s all irrelevant because I have a pragmatic theory of knowledge, and JTB doesn’t apply, or I’m Nietzschean and “knowledge” is just the mask of will to power, etc. It would be really good if we could actually nail down all the big stuff before we really dig into the small stuff, but alas, that’s not how the hermeneutic circle works.

          2. 1

            Hm interesting, I’d be interested to read what Hume said about it, and why he was wrong :)

            1. 2

              Enquiry Concerning Human Understanding 7.1

              The chief obstacle, therefore, to our improvement in the moral or metaphysical sciences is the obscurity of the ideas, and ambiguity of the terms.[ …] And, perhaps, our progress in natural philosophy is chiefly retarded by the want of proper experiments and phaenomena, which are often discovered by chance, and cannot always be found, when requisite, even by the most diligent and prudent enquiry. As moral philosophy seems hitherto to have received less improvement than either geometry or physics, we may conclude, that, if there be any difference in this respect among these sciences, the difficulties, which obstruct the progress of the former, require superior care and capacity to be surmounted.

              As for why it’s wrong, so far no attempt at doing philosophy in clear language has succeeded, but maybe we just need to give it another three hundred years.

        2. 1

          There’s a quote by Noam Chomsky which is something to the effect of … “When I presented my ideas to mathematicians, they evaluated me on those ideas. When I presented my ideas to social scientists, they wanted to know what credentials I had before listening to me”.

          This is the ideal we should all strive for and why anonymity is so important. Also why I believed datalisp would not require much rhetoric… although I should have realized that Ron Rivest was not enough to change the way we do the web so who am I? Still. The reason mathematicians behave this way is because they know that theorems hold up in reality.

    11. 17

      There is a nearly perfect solution, namely equality saturation.

      From the tutorial:

      equality saturation explores all possible variants of a program that can be derived from a set of rewrites, and then it extracts the best one

      You have to build your own equivalence relations which can be a lot of work. While nothing is perfect, this beats the heck out of doing this by hand.

      1. 14

        In the article it is briefly mentioned that the Cranelift compiler backend is considering/planning to adopt equality saturation! It’s a very cool approach.

      2. 4

        …all possible variants of a program…

        Sounds like a great way to replace your problem with “come up with a useful set of rewrites that can be explored before the end of the universe”. Sounds like a fun paper to read! Lemme give it a skim…

        Upon reaching a fixed point (saturation), 𝐸 will represent all equivalent ways to express 𝑝 with respect to the given rewrites. After saturation (or timeout), a final extraction procedure analyzes 𝐸 and selects the optimal program according to a user-provided cost function.

        So it searches until some termination condition is reached or until it times out. On the one hand it sounds like a potentially really good approach and on the other it seriously pings my “all you need to do is tune your parameters right! Now you have two problems!” senses. I admit that being able to say “spend <= 5 seconds optimizing this program” or “spend <= 4 hours optimizing this program since this is running overnight in CI anyway” sounds pretty appealing. So, I’ll reserve judgement until I see it in practice more.

        1. 9

          Author of that paper here. You are right that many sets of rewrites can spin out until the end of time. But, in practice, we’ve found that saturation is rarely necessary: you typically find what you’re looking for pretty early!

          Obligatory ad for the Zulip if you’re into this kind of thing: https://egraphs.zulipchat.com/

          1. 2

            you typically find what you’re looking for pretty early!

            How do you know you have what you’re looking for?

            1. 2

              When you extract a term, you know it’s the best thing you know about so far. Unless you saturate, the next best thing might be “right around the corner”. But that’s true for a lot of optimization processes.

              1. 2

                Would be interesting to see if you could hook up a SAT solver somehow, to prove that no better case exists. I’ve thought about it in a more general case of ISAs, but that’s very hard to model for SAT, maybe it could be easier with egraphs.

                1. 2

                  I think you’re describing a variant of a superoptimiser?

                  1. 1

                    I guess? There might be a difference from traditional ones though in that with e-graphs you can already get pretty close, and they might be easier and faster to model in solvers, since apparently they already use them for representation.

          2. 2

            Thanks! I mostly spend my time in the /r/programminglanguages Discord for better or worse. People there mentioned that egraphs still have a hard time dealing with control flow equivalences though, have any good sources for discussion on the problems or solutions involved with that?

            1. 6

              Yes, dealing with control flow is difficult. Ultimately, while the “put all your rewrites in and push go” story is easy to sell, I don’t think it’s realistic. I think where we need to go (and this is stuff we are currently working on) is how to combine egraphs with more conventional techniques. Being able to represent many programs “superimposed” on one another is just too good to give up. We have seen success in many domains with the more simplistic plug-and-chug approach, but it won’t scale to genera purpose languages. the cranelift folks realized this and are still carefully selecting what things to do in the egraph.

              1. 3

                Thanks, it’s nice to know about the gotchas as well. At worst it sounds like there’s potential for good and flexible transformations within basic blocks, which is still pretty useful. Is the problem that the number of valid transformations/combinations explodes when you start considering control flow and it’s just hard to deal with, or that it’s hard to measure which program variant is best, or something else?

                …oh, this just occurred to me. If I can bug you more… do you know if anyone has tried applying egraphs to SIMD auto-vectorization and and how well it works if so? Seems like a place it could shine, given how hard vectorization is to represent by other methods. …I might just have to start digging through google scholar and compiler conference talks…

                1. 1

                  E-graphs very naturally represent expressions; representing statements and control flow (or anything else that requires context) gets awkward. It’s definitely possible, just needs more work!

                  Re: vectorization https://www.cs.cornell.edu/~avh/diospyros-asplos-2021-preprint.pdf

        2. 1

          In practice, we just put rewrite rules into egg and let its scheduler handle rule selection and timeouts.

    12. 15

      I think CHERI is a big unknown here. If CHERI works, language level memory safety is less valuable, and Zig will be more attractive and Rust less.

      I am pretty optimistic about CHERI. The technology is solid, and its necessity is clear. There is just no way we will rewrite existing C and C++ code. So we will have CHERI for C and C++, and Zig will be an unintended beneficiary.

      1. 26

        For desktop and mobile applications, I’d prefer a safety solution that doesn’t require a billion or more people to throw out the hardware they already have. So whatever we do, I don’t think relying exclusively on CHERI is a good solution.

        1. 3

          People throw away their hardware, at least on average, once every decade. I’d much rather a solution that didn’t require rewriting trillions of dollars of software.

          1. 2

            People throw away their hardware, at least on average, once every decade.

            True, the software upgrade treadmill forces them to do that. But not everyone can keep up. Around 2017, I made friends with a poor person whose only current computer was a lower-end smartphone; they had a mid-2000s desktop PC in storage, and at some point also had a PowerPC iMac. It would be great if current software, meeting current security standards, were usable on such old computers. Of course, there has to be a cut-off point somewhere; programming for ancient 16-bit machines isn’t practical. I’m afraid that 3D-accelerated graphics hardware might be another hard requirement for modern GUIs; I was disappointed that GNOME 3 chose fancy visual effects over desktop Linux’s historical advantage of running well on older hardware. But let’s try not to keep introducing new hardware requirements and leaving people behind.

      2. 13

        Wouldn’t CHERI still discover these issues at runtime versus compile time? Do not get me wrong, I’m still bullish on CHERI and it would be a material improvement, but I do think finding these bugs earlier in the lifecycle is part of the benefit of safety as a language feature.

        1. 8

          That’s why I said “less valuable” instead of “nearly useless”.

          1. 2

            Makes sense, thank you, just checking my understanding

      3. 2

        Is there a performance overhead from using CHERI?

        1. 6

          CheriABI paper measured 6.8% overhead for PostgreSQL benchmark running on FreeBSD in 2019. It mostly comes from larger pointer (128 bits) and its effect on cache.

          1. 1

            Note that those numbers were from the CHERI/MIPS prototype, which was an in-order core with a fairly small cache but disproportionately fast DRAM (cache misses cost around 30ish cycles). Setting the bounds on a stack allocation was disproportionately expensive, for example, because the CPU couldn’t do anything else that cycle, whereas a more modern system would do that in parallel with other operations and so we typically see that as being in the noise on Morello. It also had a software-managed TLB and so couldn’t speculatively execute on any paths involving cache misses.

            The numbers that we’re getting from Morello are a bit more realistic, though with the caveat that Arm made minimal modifications to the Neoverse N1 for Morello and so couldn’t widen data paths of queues in a couple of places where the performance win would have been huge for CHERI workloads relative to the power / area that they cost.

        2. 3

          We’re starting to get data on Morello, though it’s not quite as realistic a microarchitecture as we’d like, Arm had to cut a few corners to ship it on time. Generally, most of the overhead comes from doubling pointer sizes, so varies from almost nothing (for weird reasons, a few things get 5-10% faster) to 20% for very pointer-dense workloads. Adding temporal safety on top, on the four worst affected of the SPECCPU benchmarks costs about 1% for two, closer to 20% for the others (switching from glibc’s malloc to snmalloc made one of those 30% faster on non-CHERI platforms, some of SPEC is really a tight loop around the allocator). We have some thoughts about improving performance here.

          It’s worth nothing that any microarchitecture tends to be turned for specific workloads. In designed for CHERI would see different curves because some things would be sized where they are hitting big wins for CHERI but diminishing returns for everything else. The folks working on Rust are guessing that Rust would be about 10% faster with CHERI. I believe WASM will see a similar speed up and MSWasm could be 50% or more faster than software enforcement.

          1. 1

            for weird reasons, a few things get 5-10% faster

            If you happen to have any easily-explained concrete examples, I’d be curious to hear about these weird reasons…

            1. 5

              I don’t know if anyone has done root-cause analysis on them yet, but typically it’s things like the larger pointers reduce cache aliasing. I’ve seen one of the SPEC benchmarks get faster (I probably can’t share how much) when you enable MTE on one vendor’s core because they disable a prefetcher with MTE and that prefetcher happens to hit a pathological case in that one benchmark and slow things down.

              It’s one of the annoying things you hit working on hardware security features. Modern CPUs are so complex that changing anything is likely to have a performance change of up to around 10% for any given workload, so when you expect your overhead to be around 5% on average you’re going to see a bunch of things that are faster, slower, or about the same. Some things have big differences for truly weird reasons. I’ve seen one thing go a lot faster because a change made the read-only data segment slightly bigger, which made two branch instructions on a hot path land in slightly different places and no longer alias in the branch predictor.

              My favourite weird performance story was from some Apple folks. Apparently they got samples of a newer iPhone chip (this is probably about 10 years ago now), which was meant to be a lot faster and they found that a core part of iOS ran much, much slower. It turned out that, with the old core, it was always mispredicting a branch, which was issuing a load, and then being cancelled after 10 cycles or so. In the newer core, the branch was correctly predicted and so the load wasn’t issued. The non-speculative branch needed that load a couple of hundred cycles later and ended up stalling for 100-200 cycles waiting for memory. The cost of the memory wait fast over an order of magnitude higher than the cost of the branch misprediction. They were able to add an explicit prefetch to regain performance (and get the expected benefit from the better core), but it’s a nice example of how improving one bit of the hardware cause a huge systemic regression in performance.

              1. 1

                Interesting, thanks – reminds me of some of the effects described in this paper (performance impacts of environment size and link order).

                Once doing some benchmarking for a research project circa 2015 or so I found a MySQL workload that somehow got consistently somewhat faster when I attached strace to it, though I unfortunately never figured out exactly why or how it happened…

                1. 2

                  There was another similar paper at ASPLOS a few years later where they compiled with function and data sections and randomised the order of a bunch of benchmarks. They found that this gave a 20% perf delta and that a lot of papers about compiler optimisations were seeing a speed up simply as a result of this effect. Apparently the same team later produced a tool for properly evaluating optimisations that would do this randomisation and apply some statistics to see if your speed up is actually statistically significant.

    13. 2

      Tests should be run in random order by default.

      Random and parallel by default!

      1. 3

        Parallel, yes, random, not so certain.

        For me, the issue is reproducibility. One thing that has been the bane of my life are flaky tests (for whatever reason). So something that deliberately adds to this flakiness doesn’t help.

        Whenever I do tests which include randomness (for instance Property based tests https://scalacheck.org/) I always end up specifying the seed so I can reproduce any issues.

        1. 12

          RSpec (and I suspec most test frameworks) lets you specify seed for a run. When not specified it will pick one at random. So by default you have your tests ran in random order but when you have a flaky test you can specify seed and run tests in that particular order to debug it. Best of both worlds.

        2. 7

          If you’re able to reproduce test failures (by failures returning the seed and runs able to take the seed as a parameter), and if you treat failures as actual bugs to fix (whether that be a bug in the application or a bug in the tests), then I don’t have a problem with randomised tests that pass sometimes and fail other times.

          After all, the point of a testsuite is to find bugs, not to be deterministic. So if randomness is an effective way to find bugs, then I’m all for it!

          1. 1

            After all, the point of a testsuite is to find bugs, not to be deterministic. So if randomness is an effective way to find bugs, then I’m all for it!

            The issue I would bring up in the case of random test order is that while it does find bugs, the bugs it finds are often not in the application code under test – instead it tends to turn up bugs in the tests. And gets from there into the debate about cost/benefit tradeoffs. If putting in the time and effort (which aren’t free) to do testing “the right way” offers only small marginal gains in actual application code quality over doing a bare-minimum testing setup – and often, the further you go into “the right way” the more you encounter diminishing returns on code quality – then should you be putting in the time and effort to do it “the right way”? Or do you get a better overall return from a simpler testing setup and spending that time/effort elsewhere?

            1. 1

              Tests are code. Code can have bugs. Therefore, tests can have bugs.

              But as with any bug, it’s hard to say in general case what consequences are. Maybe it’s just a flaky test. Maybe it’s a bug that masks a bug in the application code.

              You’re also right that software quality is on a spectrum. A one-off script in bash probably doesn’t need any tests. And formal correctness proof is very time/money-expensive. Another blog engine probably doesn’t need that level of correctness.

              Of all the things one can do to improve software quality (including test code quality) running tests in random order is not that expensive.

        3. 2

          For me, the issue is reproducibility. One thing that has been the bane of my life are flaky tests (for whatever reason). So something that deliberately adds to this flakiness doesn’t help.

          Admittedly a truism, but: If the “test” is flaky, then it is not a test.

          Whenever I do tests which include randomness (for instance Property based tests https://scalacheck.org/) I always end up specifying the seed so I can reproduce any issues.

          Exactly. This is very good practice.

          I have taken it a step further: When I build randomized tests, I have the failing test print out the source code of the new test that reproduces the problem.

        4. 2

          Catch2 always prints its seed, so you can let it generate a random seed for your normal test runs, but if a test fails in CI or something, you can always look back at the log to see the seed and use that for debugging.

          If you can’t reproduce a failing test because of randomness, that’s a failure of the test runner to give you the information you need.

        5. 1

          Have you tried rr? Especially with its chaos mode enabled it’s really helpful for fixing nondeterministic faults (aka bugs).

    14. 2

      Bookmarked. I’m giving this a whirl the moment it’s available in stable.

      1. 1

        I don’t think it will be available in stable for a while - to get around this I usually run Fuzzcheck (and some other tools which are nightly-only) with a nightly compiler build and everything else in stable.

    15. 2

      Wow - completely coincidentally I also moved my website from Cloudflare to Fly.io (served using Caddy) this afternoon. No Nix though :)

      1. 3

        Would you be able to say a few words on why you moved a static site away from cf ?

        1. 6

          Their (lack of) policies against hate speech.

    16. 4

      Forgive me if I missed it. What is Web Assembly doing here? It sounds like the runtime uses the web assembly output? What is the benefit of this instead of running the rust code for whatever hardware your server is running?

      Interesting project either way! The api seems nice at first glance.

      I see security listed as a web assembly benefit, but that is for running untrusted code right? I imagine your web server code is trusted.

      1. 18

        It replaces async/await. A WASM interpreter is able to suspend WASM code at any time, without involving OS threads or explicit futures.

        1. 4

          If you have any articles about that, please post them. I read the lunatic readme and it’s a little light on details. It mostly just described how it inherits the benefits of webassembly.

          1. 5

            It definitely leverages a lot of WebAssembly’s features (e.g. very lightweight sandboxing, support for various languages, etc). The wasmtime documentation (the WebAssembly JIT which Lunatic uses) has a lot of information about how it executes WebAssembly, as well as the API it provides to manipulate WebAssembly programs and interface with them while running them.

            In terms of concrete things Lunatic does (as I understand it)

            • all IO operations are processed asynchronously øn the Rust side (so other processes can continue to run even while one process is waiting on the result of a blocking IO operation)
            • every process is only permitted to run for a certain number of instructions before it is paused so that a different process can run
        2. 4

          By the time you’re selecting which interpreter to suspend and resume, you’ve reinvented a thread scheduler. However, your thread scheduler has a VM save/restore in the loop. You’d need to measure, but it may not be faster than just using a thread.

      2. 2

        The wasm is because this is built on lunatic, which is a wasm runtime

    17. 3

      I never got to take a compiler course in school, so I’m toying around with writing a compiler front-end (lexer/parser) from scratch. I’ve done stuff on my own with lex(flex)/yacc(bison) before, for this I’m not inventing a language or anything, just learning things. I wrote a math expression parser with order of operations including evaluation, simple, but it made me happy.

      I’m torn between learning about type checking vs code generation, my goal is to get the point to generate a simple “hello world” standalone executable, maybe using QBE or LLVM as a backend.

      1. 2

        What language are you compiling? Or is it some ad-hoc you’ve thrown together for the purpose?

        1. 1

          I’m just doing math expressions right now, I was thinking of doing something small and well-defined, like a small Python subset.

      2. 2

        If you’re using Rust, I highly recommend Cranelift.

        1. 1

          Interesting, but I’m using C++.

    18. 2

      I recommend Andreas Zeller’s “Why Programs Fail: A Guide to Systematic Debugging”, in which the author suggests keeping a log, where entries are in the form

      • Hypothesis: The sample program works.
      • Prediction: The output of sample 11 14 is “11 14.”
      • Experiment: We run sample as previously.
      • Observation: The output of sample 11 14 is “0 11.”
      • Conclusion: The hypothesis is rejected.

      (This is an example for a toy program, real examples are more complex)

      Combined with algorithmic debugging, it is a really invaluable method for finding bugs

      1. Assume an incorrect result $R$ has the origins $O_1$, $O_2$, . . . , $O_n$.
      2. For each of the origins $O_i$ , algorithmic debugging inquires whether the origin $O_i$ is correct or not.
      3. If one of the origins $O_i$ is incorrect, algorithmic debugging restarts at step 1 with $R = O_i$.
      4. Otherwise, all origins $O_i$ are correct. Then, the infection must have originated at the place where $R$ was computed from the origins. The process terminates.
      1. 1

        That’s the kind of thing I was looking for! TBH the above sounds somewhat heavy-weight for regular use (especially how hard describing some hypotheses can be, let alone summarizing results), but OTOH there is some unavoidable complexity in debugging and perhaps at some point this kind of overhead becomes negligible?

        Thanks for the book recommendation! I’ll try to get my hands on it and use its suggestions as a starting point for further experimentation.

        1. 1

          (especially how hard describing some hypotheses can be, let alone summarizing results)

          I found this too when I first tried the techniques, but I’ve come to realise that if I can’t write down hypotheses about why the system isn’t working, then this usually means I need to try and understand the system better before I can effectively debug it.

          TBH the above sounds somewhat heavy-weight for regular use… OTOH there is some unavoidable complexity in debugging and perhaps at some point this kind of overhead becomes negligible?

          The author considers this too!

          Not every problem needs the full strength of the scientific method or the formal content of a logbook. Simple problems should be solved in a simple manner—without going through the explicit process. If we find a problem we suppose to be simple, the gambler in us will head for the lighter process. Why bother with formalities? Just think hard and solve the problem.

          The problem with such an implicit “quick-and-dirty” process is to know when to use it. It is not always easy to tell in advance whether a problem is simple or not. Therefore, it is useful to set up a time limit. If after 10 minutes of quick-and-dirty debugging you still have not found the defect, go for the scientific method instead and write down the problem statement in the logbook. Then, straighten out your head by making everything formal and exact—and feel free to take a break whenever necessary.