Threads for meithecatte

    1. 14

      Unfortunately Shellcheck isn’t aware of this behavior yet. I’ve opened an issue: https://github.com/koalaman/shellcheck/issues/3088

      1. 10

        ShellCheck can’t really do anything about it, because

        1. It depends on the value of variables at runtime. You could detect it for constants, but most variables are not constants.
        2. It only happens in ksh-derived shells, not dash, busybox ash, yash, OSH, etc.

        As mentioned elsewhere in this thread, OSH has arrays (arrays being why bash has the bug), but it does not have any hidden eval, including this bug.

        OSH is the most bash-compatible shell in the world, and also the most bash-compatible one that doesn’t have this bug :)

        1. 6

          It depends on the value of variables at runtime. You could detect it for constants, but most variables are not constants.

          Shellcheck already forces you to accept certain warnings, such as that echo '${foo}' is most likely not what you are trying to do.

          I believe having behavior of:

          • suggest to use [ "$foo" -lt 5 ] over [[ "$foo" -lt 5 ]]
          • suggest to double check the use of (( foo + 5 )) to ensure foo is numeric only
          • suggesting to only use trusted inputs/put limits on any of the examples that will make you run into this
          1. 10

            The author of this post learned of the issue from a 2018 post on Vidar Holen’s blog. Vidar is the author of ShellCheck:

            https://www.vidarholen.net/contents/blog/?p=716

            Pretty soon ShellCheck will just have to warn that bash is installed on the system:

            # shellcheck.sh
            if command -v bash; then
               echo "Don't use bash"
               exit 1
            fi
            
            1. 2

              Ahh . Yeah maybe we are soon to be at a point where Bash can be abandoned.

        2. 2

          ShellCheck should maybe warn about probably-dangerous indirection like "${!varname}" and printf -v "$varname" (if it doesn’t warn already).

      2. 2

        Is there a mitigation other than validating that variables used in arithmetic context only contain numeric values before use? e.g:

        $ x='a[$( echo "evaluated" >&2 )]'
        $ # bad
        $ if (( x > 0 )); then echo "gt 0: $x"; fi
        evaluated
        $ # OK
        $ if [[ $x =~ ^[0-9]*$ ]] && (( x > 0 )); then echo "gt 0: $x"; fi
        
        $ x=42
        $ if (( x > 0 )); then echo "gt 0: $x"; fi
        gt 0: 42
        $ if [[ $x =~ ^[0-9]*$ ]] && (( x > 0 )); then echo "gt 0: $x"; fi
        gt 0: 42
        
        1. 3

          only contain numeric values before use?

          I think that’s the appropriate thing to do, arithmetic on untrusted input is something you’d discourage in any language I believe. I’d add a [ ${#x} -lt 5 ] in there as well to ensure short values as well if it’s untrusted data. But still, quite awful.

          1. 4

            Though, the Python equivalent of the bash code above would be:

            if re.match(r'^[0-9]*$', x) and eval(x) > 0:
                print(f"gt 0: {x}")
            

            and that, clearly, is insane.

    2. 5

      From the Bash FAQ (http://mywiki.wooledge.org/BashFAQ/031)

      As a rule of thumb, [[ is used for strings and files. If you want to compare numbers, use an ArithmeticExpression, […]

      So using [[ compare strings with integers is arguably incorrect, and could probably be caught by a linter.

      1. 13

        It works with arithmetic expressions too though! And not just on the left side of the expression. This is mad.

        $ x='a[$( echo "evaluated" >&2 )]'
        $ (( 1 == "$x" ))
        evaluated
        $ (( 1 < "$x" ))
        evaluated
        $ (( "$x" > 1 ))
        evaluated
        
        1. 4

          I was unsure if this could be disabled by using the variable in the shorter (and by linters usually considered more correct) way, but apparently not that either:

          $ x='a[$( echo "evaluated" >&2 )]'
          $ (( 1 == x ))
          evaluated
          
          1. 11

            Yes. Funnily enough, it’s recursive.

            ~$ x=y; y=x
            ~$ ((x))
            bash: ((: y: expression recursion level exceeded (error token is "y")
            
            ~$ x=y; y=z; z=meow; meow='a[$( echo "evaluated" >&2 )]'
            ~$ ((x))
            evaluated
            
    3. 29

      Initially I’ve read the title and a part of the article and I was almost quick to dismiss the article.

      However, although I do consider that I know my bash very well and I write very defensive scripts, I couldn’t believe my eyes when I saw the execution of the following snippet:

      a=''
      x='a[$( sleep 2s )]'
      [[ "$x" -eq 42 ]]
      

      I.e. if one replaces sleep 2s with another side-effect command, it is executed. (I didn’t manage to make it interact with the console, but the command does execute.)

      I really I can’t state how shocked I am to learn this… I now wonder if there are other places in bash where, although one properly quotes arguments (and in case of [[ the quoting isn’t even necessary according to the documentation), this hidden evaluation happens?

      1. 26

        There are also 10+ other places it happens in the language! It’s anywhere that bash and ksh accept a number, like $(( x )) and ${a[x]}, the argument to printf -v, unset, etc.

        I listed them all here in 2019 - https://github.com/oils-for-unix/blog-code/blob/main/crazy-old-bug/ss2-demos.sh#L49

        Back then, I was able to find vulnerable example code on the web, but it was difficult to find a “real” script that was vulnerable – i.e. user input was in fact subject to a hidden eval, aka user-supplied code execution.

        Otherwise I would have made a lot more noise about it. (OpenBSD ksh fixed it based on my report, but bash didn’t)

        (See my comments elsewhere in this thread)

        1. 2

          Did you also report this to zsh? What did they say? It is vulnerable to ciprian’s example and probably some of yours.

          1. 5

            I didn’t report it to zsh in 2019, probably because I didn’t find it then. I just tried again and didn’t find it in zsh:

            https://lobste.rs/s/mla0ns/til_some_surprising_code_execution#c_odikpl

            $ zsh -c 'x=$1; echo $x; [[ "$x" -eq 42 ]]' dummy 'a[$(echo 42 > PWNED)]'; cat PWNED; rm PWNED
            a[$(echo 42 > PWNED)]
            cat: PWNED: No such file or directory
            
            $ zsh --version
            zsh 5.9 (x86_64-debian-linux-gnu)
            

            Though I’d be interested if anyone else sees it


            I think it’s a “ksh-ism” that made it into bash. Most bash-isms are actually ksh-isms … i.e. sorry to say but David Korn of AT&T, or somebody on his team, probably deserves the blame for this :-/

            Somehow they mixed up parsing and execution, in a way that’s significantly worse than Bourne shell / POSIX shell. (Bourne deserves the blame for word splitting, which OSH fixes as well.)

            1. 10

              I was asked to point out that the second issue actually exists in ZSH: you just have to use a previously defined variable (in your example, there is no $a where you want it to evaluate an index within $a). If you re-use the existing PWD definition to smuggle in the evaluation, you get the (un)desired result:

              > zsh -c 'x=$1; echo $x; [[ "$x" -eq 42 ]]' dummy 'PWD[$(echo 42 > PWNED)0]'; cat PWNED;
              rm PWNED
              PWD[$(echo 42 > PWNED)0]
              42
              

              (the “0” isn’t technically necessary, but silences an error so the test just fails)

              1. 3

                Oh wow, thanks for the correction! zsh is indeed vulnerable – I confirmed it on my machine

                Yeah the problem was a[] wasn’t defined, so that error was masking the vulnerability.

      2. 12

        Interestingly it does not always trigger:

        a=''
        x='a[$( sleep 2s )]'
        [[ "$x" = 'hello' ]]  # Does not exec code
        [[ -z  "$x" ]]  # Does not exec code
        [[ "$x" -eq 42 ]] # EXEC
        [[ "$x" -lt 42 ]] # EXEC
        

        This is terrifying. I’m reading through and testing some public facing scripts right now.

        EDIT:

        • Using [ ] instead of [[ ]] seems to avoid this issue, errors are thrown about things not being integers instead.
        • dash doesn’t seem to be affected by this exact exploit method
        • doesn’t affect case “$x” in … esac
        1. 12

          Yes, this triggers because of bash’s string-to-int code being actually a full blown expression evaluator. If it’s not an int operator, you’re fine.

    4. 20

      I am very confused as to what the article is actually trying to say. It seems to be a wishlist without any actual details.

      1. 4

        The author already has his own take on Terminal emulators: Rio Terminal. So I’m reading this post as their next project announcement to implement his wish-list. And a call to action if any of the points resonate with others.

      2. 1

        I am very confused as well, but mostly because of said wishlist, since to me all those points are already present in quite few popular (or not) editors.

        1. 1

          all those points are already present

          There’s one item that I can’t quite picture. Can you share an example of an editor that

          adjust to your way of typing and self adjust to increase typing speed

          I’d like to see what the author is aiming for, especially since they mention working with researchers on this.

          1. 2

            Seems like my brain faulted and I somehow missed that point, mea culpa. I don’t recall any editor that “adjusts to the way of typing” (which I assume is learning your coding/typing pattern to offer suggestions and fly-checks once you’re done typing or in-between longer periods of inactivity), but the 2nd part of that point is either too vague or it just means that editor needs to be faster at “typing”, which, depending on the editor, is already a solved problem (though it depends on more factors that just editor as the machine running whole stack will contribute to performance).

    5. 7

      I am starting to look more favorably towards Zig and their goal of having as few surprises as possible.

      1. 15

        I’ve written more Rust than I care to divulge, but I’ve only hit a bug related to this issue once: and it was entirely because it was a hobby project in which I was ignoring warnings. If I’d checked the warnings, it would have been obvious when I wrote it.

        1. 0

          Replace “Rust” with almost any other language and post it in a Rust forum and I promise you that this argument will be hammered down quicker than you can say “borrow checker”.

      2. 13

        Hey, at least Rust’s footguns have lints. https://github.com/ziglang/zig/issues/5973

        1. 9

          Note that the issue you linked has an accepted proposal to completely eliminate the footgun from the language. No lint, no naming-convention-based workaround. According to the milestone assigned to the issue, that’s expected to land in 0.15.0, or earlier if a contributor steps up and does the work before the core team gets around to it.

          1. 7

            I can’t help but notice that it was at one time or another also part of the 0.7, 0.8. 0.9, 0.10, 0.11, 0.12, 0.13, and 0.14 milestones… has something changed with the project management to make the milestone fields a more reliable indicator?

            Either way, I don’t want to take away from the main point that it’s a known issue in a pre-1.0 language with a planned and principled fix.

            1. 6

              No not really. The amount of issues solved per release is fairly consistent however. I use those milestones to make sure and visit each issue before tagging the respective release, even if the issue ends up being postponed. That said I have been more aggressively using the “unplanned” milestone lately.

            2. 1

              The load bearing part is not the milestone, but “accepted proposal”. The issue didn’t have accepted solution before!

      3. 3

        At least this isn’t a memory safety surprise, and is extremely rare to encounter (you have to ignore numerous warnings).

    6. 1

      Let’s see whether Lobste.rs is the right platform for talking about separation logic, but, nonetheless, the OP might also be interested in my (technical) blog post on a complete proof system for separation logic: https://www.drheap.nl/articles/sound-and-complete-proof-system-for-separation-logic-part-1/

      1. 1

        Are you perhaps aware of any link aggregators where this is a common topic?

    7. 9

      ref is super useful when destructuring (for example Ok(ref foo)), the fact that you can do “plain” let ref foo = bar is just syntactical accident.

      1. 7

        Let’s also note that clippy will catch it and ask you to switch to let foo = &bar; instead.

      2. 1

        It’s not a “syntactical accident”. It’s a feature that let statements are also pattern matchers.

        1. 1

          Yes, that is why I call it “accident”.

          1. 1

            You called it an accident, now you’re calling it an “accident”.

            1. 1

              Are you assuming that the quotation marks in hauleth’s more recent comment are “scare quotes”? I see no reason to think that the quotation marks are not simply serving their basic purpose, namely marking the word “accident” as being quoted from an earlier comment, in which case calling it an accident and calling it an “accident” are exactly equivalent.

      3. 1

        I’ve never quite understood why you need to do, e.g., Ok(ref foo) instead of Ok(&foo).

        1. 7

          & is part of the pattern, so you’d be moving the value out of the reference, not getting a reference. And that’s not always possible if what you’re matching on is not owned.

          You can read more about it in the ref keyword docs and the reference it links.

          EDIT: the match ergonomics RFC is also relevant because it made writing & and ref less comminly required. And the before examples can help understand when each should be used.

          EDIT 2: here’s a demonstration on the playground.

        2. 6

          Ok(&foo) will destructure foo to be a value, so if you have Result<&u32> then foo will be u32. On the other hand if you do Ok(ref foo) on Result<&u32> foo’s type will be &&u32. So these two in pattern are each other opposite. Just like let ref foo = a is the same as let foo = &a, not let &foo = a.

        3. 4

          Ok(&foo) as a pattern, by syntactical analogy, will remove a layer of Ok, and then also remove a layer of &, matching a i32 out of a Result<&i32, _>.

    8. 2

      let mutex = mutex.clone();

      Are mutexes clone-able?

      1. 4

        The mutex is stored in an Arc at the start of main, and the clone is for that Arc.

          1. 7

            Indeed. To the point where it’s advised to use Arc::clone(mutex) form to make the code more explicit (even if for the compiler that’s basically the same thing).

            1. 2

              I think that’s also recommended to avoid conflict with a clone() method that doesn’t come from Clone on the inner object

      2. 4

        Are mutexes clone-able?

        No. But the mutex you are referring to is wrapped in a std::sync::Arc, an atomically reference counted variable:

        let mutex = std::sync::Arc::new(std::sync::Mutex::new(()));
        

        It is this atomically reference counted variable that we clone, thus creating a second owner and incrementing the internal reference count to two. This is perfectly valid and a standard design pattern. The mutex within exists only once.

      3. 2

        It’s an Arc<Mutex<T>>, so behind a Smartpointer. Edit: something seems to have bugged out and I didn’t see the answers..

    9. 5

      If you want a hint on how this works translate a minimal script and then run that with bash -x

      1. 5

        I can also recommend inserting declare -p between the commands!

    10. 1

      Is there any article that describe the algebra of types without Haskell? I’m Haskell challenged, and I don’t understand the notation (what does “Bool -> Tri” even mean?)

      1. 4

        It’s a “function” annotation. One argument of Bool, and it returns the type Tri.

        1. 2

          Okay, but then I don’t follow why Tri -> Bool has eight outcomes. As I see it:

          Zelda -> true
          Zelda -> false
          Link -> true
          Link -> false
          Ganon -> true
          Ganon -> false
          

          Tri isn’t defined as being nullable, so I only count six outcomes, not eight. And I don’t think Tri is a tuple. Or am I not understanding the Haskell syntax of Zelda | Link | Ganon?

          1. 9

            There are eight possible functions:

            • always true
            • only true at Zelda
            • only true at Link
            • only true at Ganon
            • only false at Zelda
            • only false at Link
            • only false at Ganon
            • always false
          2. 5

            To add some context to @meithecatte and @kivikakk’s answer, Zelda -> true isn’t a value of Tri -> Bool, it’s a value of {Zelda} -> Bool. A value of Tri -> Bool would be something like Zelda -> true ++ Link -> true ++ Ganon -> false.

            1. 1

              As I stated: “Is there any article that describe the algebra of types without Haskell?” I don’t even comprehend what you are saying here. I always feel like there’s some serious gate keeping going on when Haskell comes up.

              1. 5

                In case Rust is any better:

                https://justinpombrio.net/2021/03/11/algebra-and-data-types.html

                You can also try the links at the end of that.

                1. 1

                  It was. Thank you.

              2. 2

                Is it clearer like this?

                +--------+--------+
                |  Tri   |  Bool  |
                +--------+--------+
                | Zelda  |  True  |
                | Link   |  True  |
                | Gannon |  False |
                +--------+--------+
                

                Another function value would be

                
                +--------+--------+
                |  Tri   |  Bool  |
                +--------+--------+
                | Zelda  |  False |
                | Link   |  False |
                | Gannon |  True  |
                +--------+--------+
                

                Does that make it clearer why there’s eight functions of type Tri -> Bool?

          3. 2

            To add some context to meithecatte’s answer, note that we’re talking about “inhabitants” of a type, i.e. how many distinct values it could have. “Zelda -> true” isn’t a function value, it’s a pair of one possible input and one possible output. There are six such pairs, but what we’ve counted there are actually just inhabitants of (Tri, Bool) again.

            Instead we need to count distinct behaviours a function could possibly exhibit. We have 2 possible outputs with 3 possible inputs. Imagine describing the function behaviour in a bitvector, where each bit describes whether the function is true or false for that particular input. Going with the Zelda, Link, Ganon order, let’s say it’s ZLG: 000 means it returns false on all inputs. 001 means Ganon gives true, but the two others give false, etc. Then we can see there’s 2^3 = 8 behaviours. Since the function is pure, any more complicated implementation eventually reduces to one of these.

            Bool -> Tri, OTOH, has 3 possible outputs, but two inputs; imagine the vector is of trits (0/1/2) instead. There we have 3^2 = 9 (00/01/02/10/11/12/20/21/22) possible functions.

            1. 4

              Set -> Bool is a binary string” is how I remember whether the cardinality is To^From or From^To.

            2. 3

              Okay, that makes sense. Thank you.

              No wait, that only kind of makes sense.

              No wait … it makes sense. I think. God this stuff seems so esoteric. It took me over a decade to understand what a monad is, and I still don’t think I have it down fully.

              1. 1

                In my experience, repeatedly bashing your head on the topic, over and over again, over a period of years to decades, is indeed sufficient to get a good working grasp on it. (In other words: nice job! Keep it up!)

                But also taking the material to various limits can sometimes help, to limn the edges of it. To further simplify the examples and maybe make a little more clear the difference between (a, b) and a -> b, let’s look instead at Bool and Unit (only one value: unit. In Haskell this is written (), but let’s call it unit, since this isn’t actually dependent on Haskell or anything).

                (Bool, Unit) has two inhabitants: (true, unit) and (false, unit). The same is true for (Unit, Bool): (unit, true) and (unit, false).

                What about Unit -> Bool? In other words, how many different functions are there that, given a value that doesn’t vary in any way, return a boolean? There’s two: one which always returns true, and one which always returns false. (In Haskell terms, these functions are const true and const false respectively! In JavaScript, you can write them as the lambdas () => true and () => false; no argument is as good as a unit argument.)

                How about Bool -> Unit? Here the non-symmetry becomes clear, which is quite different from the tuple case. How many different functions are there that, given a boolean, returns the one same non-varying value? There’s only one possible return value, so there’s only one possible implementation: a function that returns unit regardless of the input boolean. (const () in Haskell, or (_) => undefined or similar in JavaScript.)

      2. 2

        See ~justinpombrio’s comment for a very nice Rust-flavoured explanation.

    11. 46

      This story describes how this result came together, which required the collaboration of more than 20 people (many without academic credentials), including one person who didn’t speak English and had to communicate using Google Translate.

      It also describes one collaborator, who: 1) dropped out of university and had no formal training, 2) learned Coq for intellectual stimulation, 3) formalized several of the team’s proofs in Coq; and then after this effort, 4) decided to become a train driver.

      1. 23

        Not to mention mxdys, a pseudonymous contributor who appeared out of the blue towards the end of the project and dropped a finished 40kloc Coq proof.

      2. 15

        That would be @meithecatte if I’m not mistaken!

        1. 17

          Hi :3

          (that is to say, yes, that’s me)

      3. 5

        Thank you! That article makes a great story, and shows just how beautifully weird the world can be sometimes. Also how much math gets done by random bored eastern europeans. What is it about Slavic culture or history or society that just seems to produce lots of people who tinker with weird math problems? And now we get to add an entire cadre of pseudonymous internet denizens to the mix. What a time to be alive.

        I was a little disappointed that the article didn’t mention what is to me the most interesting part of the Busy Beaver problem: it’s the complexity boundary between computable and uncomputable functions. No computable function can grow faster than BB does, by definition. Fortunately, there was another article in the sidebar that talks about exactly that. It also does a good job of illustrating some of the implications of that: Because you can write Turing machines that attempt to solve particular math problems like the Collatz conjecture, the BB problem encompasses lots of other problems by asking the question “is this function computable?”

        1. 8

          What is it about Slavic culture or history or society that just seems to produce lots of people who tinker with weird math problems?

          This is a just so story, but one version is that it was forced industrialization in the USSR. The state cared very much about bridges, tanks, planes, and similar stuff, and all that runs on math. So, there was a lot of state investment into mathematics, physics, and espionage.

          I can somewhat corroborate this, as, at the university at my time, there was a very clear generation of old USSR-style profs who dominated maths&physics classes.

          1. 6

            Even the International Math Olympiad (IMO) came out of the eastern bloc. It’s the original science Olympiad and the blueprint for modern math and science competitions.

          2. 4

            My math professors also regularly claim that math education used to be much more advanced back then, and to this day the best books in several topics came from the USSR. I believe what praise actually should go towards the USSR’s education is that they were very good at recognizing talent and actually elevating them by very advanced, specialized classes.

            1. 5

              Recognizing, yes, elevating is, ahem, mixed: https://en.m.wikipedia.org/wiki/Sharashka

          3. 2

            The phenomenon predates the Soviets somewhat. If I had to make a similar guess, I’d point out Markov, as well as his brother Markov and son Markov. Between the three of them, they invented a large chunk of the modern maths that are required for computer science, as well as breakthroughs in everything from chess to meteorology. Markov Jr. is credited with founding the Russian school of constructivist mathematics, and like all heterodoxies, it had a cottage industry of researchers exploring its then-new principles.

            That said, the Soviets are definitely the cause of so many Slavic women in maths.

    12. 7

      I’ll conclude this brain dump by pointing out that much of the emerging narrative about this backdoor that you can read all over the net is based on idle speculation and selective interpretation of facts.

      Too true. Relatedly, never been more disappointed with the so-called (community decreases as population increases) Linux community’s response to an issue. Lots of panicking, fear-mongering, FUD about FOSS and its processes, mostly based on vibes. Don’t get me started on the Windows shills. Seeing this called the “Linux backdoor” has really pissed me off.

      1. 14

        Like it or not, Linux is mainstream now. Plenty of enterprises have gone from not knowing anything about it, to vaguely knowing that some parts of it are used in their organization, to knowing it’s an integral part of it, and that big events are as likely to hit them as any Windows-based flaw. Hence, panic, because the reporting is just as invested as spreading plausible misinformation for clicks as for any other product.

        Meanwhile, existing, deployed serious breaches in VPN software (Ivanti and others) are basically background noise in the venues where the xz issue is described in apocalyptic terms.

        1. 1

          in VPN software (Ivanti and others)

          I thought Ivanti was some niche thing that my last employer happened to have. I’ve never used a more mismanaged software.

          1. 1

            It used to be called something else, and they rebranded it (in the most incompetent way possible).

            At a previous workplace we RDP:d into a lot of customer systems so I got to use a bunch of Windows-based VPN solutions. They all had abysmal UI and I can’t help but think it was a symptom of crappy coding over all.

            1. 2

              They used to be LANDESK, then they acquired HEAT, and went through a rebrand together to Ivanti.

              1. 2

                Wasn’t it called also pulse-secure or similar?

                1. 3

                  Yeah, Ivanti bought Pulse Secure in 2020, but initially kept it as a separate brand. It was merged in to the Ivanti client last year.

      2. 7

        The discourse that I saw around this was:

        • “We need big tech funded consortiums” – this already seems to exist. Back in 2014, Linux Foundation threw some money at the GNU bash project after ShellShock. Certainly funding can be improved, but I don’t think this is the core issue.
        • Analysis of the social engineering
        • Some great analysis of the shell scripts
        • Some great analysis of the executable payload

        I did not see enough discussion of choices by Debian and Red Hat:

        • dynamic linked deps – the sshd -> systemd dep, and the systemd -> xz-utils dep.
          • This dependency does not exist on BSDs (e.g. because systemd only runs on Linux)
        • The GNU ifunc mechanism for runtime swapping of faster functions – which I didn’t know even existed until the xz backdoor.
          • Again this does not exist on BSDs (oops, seems to exist on FreeBSD?). I think it’s a runtime mechanism to avoid building multiple ELF files for different CPUs, e.g. similar to how OS X has multi-arch binaries, and Python wheels can have multiple architectures with them.
          • It introduces a new attack surface when combined with dynamic linking.

        So I think the root cause of this exploit is NOT necessarily “open source maintenance is under-funded” or even “autoconf and horrible shell scripts” (though those can’t be eradicated soon enough!)

        I believe the root cause is an bug in the architecture of Debian Red Hat. And these were contributed to by GNU libc and systemd. (BTW I only use Linux – never BSDs – and I’m very disappointed by this.)


        My thought experiment is:

        1. Suppose I develop a compression libzandy.so, and a command line tool zandy
        2. Suppose it’s installed on all Debian and Red Hat machines.
        3. But it’s not dynamically linked into sshd.
        4. In contrast, liblzma.so is dynamically linked into sshd.

        Is Jia Tan going to attack me, or is he going to attack the maintainer of xz-utils?

        I believe strongly it’s the latter. For the same reason that BSDs, were not attacked – there’s no avenue of attack there. You can backdoor xz-utils on BSDs, but it does nothing as far as getting you into the system remotely.

        Dynamically linked deps + GNU ifunc opens up the avenue of attack, which was not there in 2015.


        This sshd -> xz-utils dependency was only introduced in 2015, so Debian and Red Hat did a big disservice to the xz-utils maintainer by architecting their distros this way.

        They made him a target, likely of nation-states. After 10+ years of the xz-utils project existing.

        The backdooring shell scripts don’t bother to do anything when it’s not Linux, because other Unixes aren’t vulnerable to this attack.

        Likewise, Alpine Linux is not vulnerable to this attack, because it uses neither systemd or GNU libc (instead, openrc and musl libc)


        I would like to write a blog post about this, but I need time to play with the actual code first, like using GNU ifunc, to make sure I understand it fully.

        Good discussion here (sorry it got a bit heated, but there are good comments) - https://lobste.rs/s/uihyvs/backdoor_upstream_xz_liblzma_leading_ssh#c_wgmyzf


        Why do I focus on this aspect? Because it’s the easiest thing to fix. In contrast:

        • “open source maintainership” - big fuzzy problem, every solution has upsides and downsides
        • autoconf/shell “removal” - working on it, but is approximately the same amount of work as replacing C++ with Rust

        IMO Debian and Red Hat need to not just revert xz-utils to a previous version, but clean up the dependencies of sshd.

        That is concrete, and easily doable, without loss of functionality.

        1. 11

          So I think the root cause of this bug is NOT necessarily “open source maintenance is under-funded” or even “autoconf and horrible shell scripts” (though those can’t be eradicated soon enough!)

          I don’t see this as a “bug”, I see this as a meticulously crafted exploit. There’s zero chance that a mere software bug would have enabled the planned behavior sought by the person or entity directing this effort.

          I believe the root cause is choices by Debian and Red Hat. And these were contributed to by GNU libc and systemd.

          The root cause is that Linux is now an integrated part of the global economy, and that Red Hat and Debian’s packaging are the dominant way for the most popular Linux distros to get their software. Therefore, it made sense for the attacker to try to find a way to ju-jitsu their exploit into them via the practice of building from release tarballs, rather than directly from source.

          Some people have argued that the planned decoupling of sshd from libsystemd forced the attacker’s hand, possibly prompting them to accelerate their efforts to push this into distribution despite bugs.

          There’s no one cause one can point to that would magically fix this. If one of the BSDs would have the dominant position Linux has now, attackers would be incentivized to look for loopholes to exploit, because the payoff would be so great. And C being C, loopholes would be found.

          This most probably means that there are parallel efforts afoot to subvert distributions. Hopefully the issues raised by this will get more eyeballs on the problem!

          1. 4

            Edited the comment – the root cause of the exploit is an architecture bug – that’s my belief

            There’s no one cause one can point to that would magically fix this.

            There’s no magic fix, but again I point out

            1. Attack isn’t possible before 2015 on Debian and Rehat
            2. Attack isn’t possible on BSDs
            3. Attack isn’t possible on Alpine Linux
            4. There is a concrete mitigation. You can fix it now – on top of rolling back xz-utils.

            Rolling back xz-utils is not enough.

            Even reverting ever commit Jia Tan has ever made is not enough.

            That removes the exploit, but it doesn’t remove the architecture bug.

            It fundamentally fails “defense in depth”.

            I am concentrating on actions that can be taken, rather than philosophy / casting blame. (The blame appears to be spread between Debian, Red Hat, GNU libc, and systemd, so it’s besides the point. What matters is what gets fixed.)

            1. 8

              Thanks for updating and expanding!

              I agree that this is due to a complex interplay of different mechanisms. And I also believe that if one BSD had the “market share” of today’s Linux, an attacker would be incentivized to find a way to subvert it. In other words, what is protecting the BSDs (and niche distros like Alpine) is not technical excellence, but relative obscurity/“irrelevance”.

              1. 4

                given enough attackers’ eyeballs, all bugs are shallow ;)

            2. 4

              Same point from Russ Cox

              Does pcre want a maintainer?

              https://hachyderm.io/@rsc/112207332307637828

              It’s dynamically linked into sshd on PopOS apparently.

              Russ claims it doesn’t need to be. If so, it should be removed.

              It’s an architecture bug that’s attracting Jia Tans.

              “Trusted computing base” seems to be a forgotten concept in the Linux world.


              Can a BSD or Alpine Linux user tell me if the PCRE dependency exists for sshd on their systems? – ldd $(which sshd)

              1. 6

                sshd doesn’t use libpcre on FreeBSD 14, but it does use a bunch of other stuff:

                libpam.so.6 => /usr/lib/libpam.so.6 (0x1401843f3000)
                libprivatessh.so.5 => /usr/lib/libprivatessh.so.5 (0x140183570000)
                libutil.so.9 => /lib/libutil.so.9 (0x140185927000)
                libbsm.so.3 => /usr/lib/libbsm.so.3 (0x1401848c4000)
                libblacklist.so.0 => /usr/lib/libblacklist.so.0 (0x140185671000)
                libgssapi_krb5.so.10 => /usr/lib/libgssapi_krb5.so.10 (0x14018692b000)
                libgssapi.so.10 => /usr/lib/libgssapi.so.10 (0x140186fb7000)
                libkrb5.so.11 => /usr/lib/libkrb5.so.11 (0x140188c33000)
                libwrap.so.6 => /usr/lib/libwrap.so.6 (0x140187b86000)
                libcrypto.so.30 => /lib/libcrypto.so.30 (0x14018967b000)
                libc.so.7 => /lib/libc.so.7 (0x14018ac69000)
                libprivateldns.so.5 => /usr/lib/libprivateldns.so.5 (0x140188950000)
                libcrypt.so.5 => /lib/libcrypt.so.5 (0x140189cac000)
                libz.so.6 => /lib/libz.so.6 (0x14018b575000)
                libthr.so.3 => /lib/libthr.so.3 (0x14018ba7e000)
                libroken.so.11 => /usr/lib/libroken.so.11 (0x14018ceb3000)
                libasn1.so.11 => /usr/lib/libasn1.so.11 (0x14018bffa000)
                libcom_err.so.5 => /usr/lib/libcom_err.so.5 (0x14018db2c000)
                libhx509.so.11 => /usr/lib/libhx509.so.11 (0x14018de72000)
                libwind.so.11 => /usr/lib/libwind.so.11 (0x14018eecf000)
                libheimbase.so.11 => /usr/lib/libheimbase.so.11 (0x14018ec63000)
                libprivateheimipcc.so.11 => /usr/lib/libprivateheimipcc.so.11 (0x14018f462000)
                libssl.so.30 => /usr/lib/libssl.so.30 (0x14019009e000)
                
              2. 4

                ldd for sshd on Alpine is very lean:

                /lib/ld-musl-x86_64.so.1 (0x7f12b7596000)
                libcrypto.so.3 => /lib/libcrypto.so.3 (0x7f12b7000000)
                libz.so.1 => /lib/libz.so.1 (0x7f12b74ab000)
                libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f12b7596000)
                
              3. 3

                That dependency doesn’t exist on Arch either. Debian apparently pulls it in via libselinux.

        2. 6

          The GNU ifunc mechanism for runtime swapping of faster functions – which I didn’t know even existed until the xz backdoor.

          • Again this does not exist on BSDs.

          I think it does exist, for example see FreeBSD man.

          1. 2

            Thanks, I updated the comment … that’s why I mentioned needing to play with GNU ifunc a bit before writing!

            There are some mentions of ifunc in OpenBSD - https://man.openbsd.org/OpenBSD-7.2/ld.1

            But there are some warnings and experimental options there? It doesn’t seem as prevalent?

            I think OpenBSD is more source-based, and you would just compile multiple executable files. Not replace individual parts of an executable image at runtime.


            I’m not sure if there is GNU “blame” then.

            Maybe FreeBSD implements ifunc because they want to have GNU packages run fast?

            I know that people have said FreeBSD shell actually implements bash features, because there’s a natural desire to run things from the Linux world.

            But I still stand by dynamic linking as the root cause, e.g. see Russ Cox link I quoted too

            1. 5

              It doesn’t have anything to do with GNU packages xD FreeBSD heavily uses ifunc in the libc and the kernel for anything that has CPU-dependent acceleration available – checksums, hashes, ciphers, mem/str routines, etc.

        3. 3

          My understanding is that systemd isn’t the only patched dependency to sshd that pulls in xz.

          In the specific case of systemd, the right solution is for openssh to implement support for the systemd notify protocol, which they’ve done here: https://bugzilla.mindrot.org/show_bug.cgi?id=2641

          1. 4

            I strongly disagree with this. Without getting into system deep flame wars, it’s overly complex and highly subject to security problems as a result. Like anything that has gotten too big and four standardization. You also have a monoculture of Unix. This in turn causes too much complexity which in turn lead to these kind of problems. Complexity has always been the enemy of security.

            1. 4

              Whether it is or is not (systemd has analogues on every other mainstream OS), it’s not relevant to the narrow point – that the systemd notify protocol is very simple, and OpenSSH can just implement it without taking on a new dependency.

              Also, the protocol solves a real set of problems – and other init systems should also grow the same protocol for services to indicate that they’ve finished starting up. But other init systems rely on janky daemonization to achieve the same goal, which I think is significantly worse than what systemd does. (I for one am very glad that systemd heralds the end of double-forking.)

              I don’t think it’s helpful to generally gesture about “complexity” when the only relevant surface area is this.

              1. 6

                The systemd developers only started recommending that daemons should have their own implementation of sd_notify instead of linking to libsystemd after the xz backdoor was revealed https://lobste.rs/s/uihyvs/backdoor_upstream_xz_liblzma_leading_ssh#c_uogq6b

                It isn’t recommended in the “latest” man page

                1. 2

                  Thanks. That’s definitely concerning – Lennart said on Fedi that he’s been recommending people write their own implementations, and I have no reason to disbelieve him – but clearly the documentation didn’t make that clear until the xz hack. That’s disappointing.

              2. 3

                Whilst the systemd notify protocol is very simple today, there’s no guarantee it will be in future. I can see the appeal of not having it in your codebase – it helps mitigate that different future world where the notify framework has to change or is different by distribution etc. So the question is more: should OpenSSH import an OS-provided library, or should it have to care about every possible future implementation on every possible deployment.

              3. 3

                I don’t mean to be disagreeable, but the old way of doing things by actually putting them in /var/log/messages worked fine for a very long time until system d decided to use a binary logging format. Why should the authors of SSH have to pander to systemd Linux nonsense?

                1. 2

                  I think you really are being more disagreeable than you’re intending to be here. The concerns about logging are valid (there are both solid reasons for and real downsides to binary logging – I think what I’d actually prefer overall is neither /var/log/messages nor systemd’s default binary logging but bunyan-style JSON logs on disk, along with a separate sqlite database containing various indexes on those logs). But they are also completely irrelevant to the discussion at hand, which is that services should have a way of telling the service manager that they’ve finished starting up and are ready to serve requests.

                  Other service managers can and should grow this functionality, and they can use the same protocol even if they have nothing else in common with systemd.

    13. 2

      Keep working on Rekishi, an experimental common lisp development environment I’ve been building lately.

      The idea I’m exploring is to be able to keep track of all modifications by default you do to your code and be able to browse that history. I’m also exploring an emacs plugin that allows you to bring any symbol you want in your workspace and operate on them without relying on files for organization.

      1. 1

        I assume then that the name comes from the Japanese word for “history”, and not the Japanese word for “roadkill”.

    14. 8

      Suid binaries are evil.

      They can be run in very different initial conditions, and thus make paths in code that are not designed for this available for execution.

      Instead of such binaries, one should make services that can be run from root, under well-known initial conditions.

      In https://stal-ix.github.io / we don’t have any suid binaries in system, even sudo works as ssh client + local ssh daemon.

      1. 6

        Or make the suid binary not runnable AT ALL by non wheel group members. There is a NixOS option that does this, which makes sudo exit with an error, handled by the kernel. This solves all suid related vulnerabilities.

        I’m not sure how it is done on other distros, but it’s security.sudo.execWheelOnly on NixOS: https://github.com/RGBCube/NixOSConfiguration/blob/master/modules%2Fsudo.nix#L17

        Suid binaries are definitely not evil.

        1. 3

          This sounds like it just sets the permissions to rwsr-xr--.

          1. 4

            The implementation of execWheelOnly:

                security.wrappers = let
                  owner = "root";
                  group = if cfg.execWheelOnly then "wheel" else "root";
                  setuid = true;
                  permissions = if cfg.execWheelOnly then "u+rx,g+x" else "u+rx,g+x,o+x";
                in {
            

            So, you seem to be correct.

      2. 2

        Is a suid binary a requisite for this flaw to be exploited?

        1. 11

          It would be quite unusual for a daemon to be started with an overflowing argv0 by other means.

          1. 3

            Not that weird though. For example some services will create per-slice/task/customer processes that change the argv0 to include the identifier.

          2. 2

            Thanks for expanding! I found the linked article about light on these details.

    15. 19

      I haven’t seen the term “nominal type” used this way before. Usually when people speak of a nominal type system they mean one where two types are equal if they have the same (fully qualified) name. This contrasts with structural type systems, in which two types are the same if they have the same structure (and obviously there needs to be some notion of equality of primitive types). Notably, the various schemes to distinguish structurally equivalent but semantically distinct types rely on a nominal type system to work—in a structural system all the various IDs would just be aliases of one another.

      1. 11

        What the author describes I have traditionally seen called a “phantom type”.

        1. 3

          Yeah, this is what I’d call them. This is where PhantomData gets its name from (it’s even mentioned in the docs).

      2. 7

        I’d wager the author has used TypeScript in the past. TypeScript doesn’t have the concept of newtypes; almost everything is structural, and if you want to achieve something similar to what is shown in the post you’d have to use some hacks (usually called “brands”) that look similar to what the post describes. This process is often described in the broader blog post literature as “nominal typing”.

        1. 3

          But that’s literally what structural typing is, as opposed to nominal typing. These are some of the rare terms in CS with a quite well-defined meaning.

          1. 2

            Okay, yeah, TypeScript’s type system is structural. And the branding trick allows you to derive what’s essentially nominal type semantics in an otherwise structural type system.

      3. 3

        Don’t most type systems have elements of both? For example Haskell tuples and Java arrays are both structural, while newtype and Java classes are both nominal.

        1. 7

          Haskell tuples and Java arrays are not structurally typed. If they were, I could do something like this:

          data CharAndBool = Tuple Char Bool
          
          tuple :: CharAndBool
          tuple = ('a', True)
          

          This would be possible because the types CharAndBool and (Char , Bool) would type-check equal. In Haskell, they are not equal, because altho they to have the same structure, they have different names.

        2. 3

          Unison is an example of a language that has elements of both. If I define

          structural type UserId = UserId Nat
          structural type GroupId = GroupId Nat
          unique type ProjectId = ProjectId Nat
          

          then UserId and GroupId will be equal, but ProjectId will not be equal to either of them, only itself.

        3. 1

          Java’s type system is nominal only, AFAICT.

    16. 6

      Unfortunate name collision: https://www.sagemath.org/

      1. 1

        Sage Math is Sage

        Sage programming language is SagePL

    17. 7

      I can’t find source code for whatever the hell actually generated that program, but from the PDF describing the algorithms the toolchain sounds legendarily cursed. The summary chapter implies a lot of it is written in asm with asm2bf compiling it to brainfuck? APL is used to describe many of the algorithms involved. The included fast20.c Malbolge interpreter already looks like an IOCCC entry.

      Absolute legend.

      1. 2

        I read through parts of the included book and nothing felt like staring into the abyss more than this. I am impressed.

        1. 3

          I was convinced to give it a go and, wow, yep, it just continuously escalates somehow.

    18. 11

      I suppose using a protocol like Zmodem could help with the file corruption issues.

      1. 1

        This is layer 2 right? How about layering TCP/IP on top?

        1. 3

          The TCP checksum is very weak. ZMODEM’s CRC would do a better job over a noisy channel, but if it’s very noisy then ZMODEM’s NAK restarts will be slow. At which point it’s time to consider forward error correction.

        2. 2

          I wouldn’t know! I just remember being 12yo, transferring files onto the Amiga using Zmodem with my dad, going “huh, sometimes it errors and retransmits a small chunk. where do the errors come from?”

    19. 1

      This news is too convenient to generate outrage at the evil manufacturer corporation, so it makes me suspect there’s more at play here. For example, if there are third party shops that e.g. refuse to change braking pads and simply fool the firmware into accepting used ones, and the train crashes, the manufacturer might be held responsible - if not by the courts, then at least in a PR sort of way. If there’s no certification process and the repair shops are just random companies that don’t know what they’re doing, it might be actually sensible to prevent them from doing maintenance. Of course, they should have built in a different way of doing this, for example in the contracts, or with a clearer mesage in the firmware.

      1. 9

        Why should that be solved by sneaky software and not just a contract? Possibly amended by not so sneaky software to remind you of said contract.

        There are very good reasons to shun third party workshops, but what if they’re the only option at one point? The company goes bust and Poland is stuck with randomly locking trains forever?

      2. 8

        For a long time servicing the trains was done solely by the manufacturers, as they argued that the service manual was their intellectual property. Since, the courts have decided that no, they have to provide this, and other companies could enter the train servicing market. Newag’s hand forced, they complied, with bitter taste in their mouth.

        The service manual handed over neglected to mention that some of the trains will lock up if left in place for 10 days, and that they’ll have to be unlocked with an undocumented sequence of button presses in the cabin – the assumption behind this being, trains don’t just stand around unused if they’re not being serviced. After this assumption turned out to be not entirely true, a newer version of the firmware included checks on the GPS position of the trains, with hardcoded coordinates of some competitors’ workshops.

        Not sure if you’ve read the more detailed article, or only the Mastodon posts, but I recommend putting it through Google Translate if you can’t read Polish.

        The Newag Impuls trains breaking randomly have been going on for over a year. Mysteriously, only third-party workshops had issues with this. Newag always said “see, they’re just incompetent”. An unspoken suspicion has long formed in the industry that some underhanded tactics are at play, but until now, nobody had any idea how to go about proving this.

        1. 0

          Newag replied in a press release that the third parties could be responsible for inserting this code into the firmware. I can’t decide who can be trusted here. Both scenarios seem plausible.

          1. 6

            Both scenarios seem plausible.

            What motivation would SPS (the third-party repairer) have to brick the trains they’re being paid to maintain?

            1. 3

              To ruin the reputation of the manufacturer so that they’re the only ones alowed to repair it.

              1. 5

                That is pretty much pointless TBH, especially that they were running against the schedule and they were close to being late on fixes. With that in mind that looks especially dumb. Also, the bogus code was found in other trains, that weren’t serviced by SPS.

              2. 4

                Also I do not get what SPS is supposed to gain there. They aren’t the only train servicing shop in Poland, so it is not that anyone who is using NEWAG trains will go and service their trains at SPS. SPS has next to nothing to gain there, it is only NEWAG that can lose. And SPS isn’t even direct competitor of NEWAG, as NEWAG is train manufacturing company while SPS is just servicing company. It is like saying that iFixIt fight for Right to Repair is a plot to ruin reputation of Apple/Samsung/other producers so only they can service smartphones.

      3. 3

        it might be actually sensible to prevent them from doing maintenance

        3rd party repair has been one of the conditions of the train tender. Newag shouldn’t have offered their trains if they were not OK with the terms. Period.

    20. 15

      great work! this seems very illegal (on the part of the vendor) to me.

      we do have a talk scheduled about this at 37C3, in which we plan to do a deep dive into this and actually publish our findings.

      looking very much forward to this!

      1. 2

        Why do you think it’s illegal? Polish government body that oversees train transport clearly stated that this kind of thing is not prohibited by law. The only path forward is to prove wrongdoing based on the legal contracts in a civil trial.

        1. 9

          the errors that were shown to the operator were not real, they were bogus. and if i’m reading the toots right, they could only be removed with a secret key combo by the original manufacturer - who was not awarded the service contract (so i presume these were billed to the operator. even if they weren’t, the intention was to wrestle the lucrative service contract away from a competitor). the first part (to me) feels disingenuous, the second outright fraudulent. especially since, as Psentee commented, third party maintenance was explicitly allowed.

          edit: sabotage. the word i was looking for was sabotage.

          1. 10

            There’s an article in Polish. The relevant part, translated by DeepL:

            Updated 2023-12-05 16:00

            We have received the official position of the UTK, which we quote below in full:

            The President of UTK is aware of the matter and has verified the information regarding the analysis of the software of railroad vehicles that has been carried out, and is cooperating with the relevant services in this regard. A meeting with the vehicle manufacturer was organized jointly with CERT Poland (a team set up to respond to security incidents on the Internet). The vehicles meet the essential requirements set by the provisions of European directives. It is the purchaser of the vehicle, within the framework of contractual freedom, who determines the terms of servicing and warranty. Such requirements are included in train purchase contracts. Any restrictions on serviceability, including restrictions introduced in the software, may constitute a potential civil law dispute between the ordering party and the manufacturer. The president of UTK in this case is not the competent authority.

            According to Article 41(2) of the Act of July 5, 2018 on the National Cyber Security System (i.e., Journal of Laws of 2023, item 913, 1703), the competent authority for cyber security for the transport sector (excluding the water transport subsector) is the minister in charge of transport.

            Perhaps the details of how egregious this was got lost along the way.

            EDIT: another outlet reports that the Centrai Anti-corruption Bureau is looking into this, independently of the train authority’s position that everything’s a-ok.

            1. 1

              Oh, I’ve only read the article before it was updated. Thanks!

          2. 1

            Last toot in the thread links to the source.