AAA gaming on Asahi Linux

146

AAA gaming on Asahi Linux linux asahilinux.org
via gmem 1 month ago | caches
Archive.org Archive.today Ghostarchive
| 34 comments

34

1. 71
  
  lina 1 month ago | link
  
  I wrote up some documentation on how all this works and known issues here: https://docs.fedoraproject.org/en-US/fedora-asahi-remix/x86-support/
  
  You can also watch Alyssa’s talk at XDC2024 here: https://www.youtube.com/watch?v=pDsksRBLXPk (starts around ~20 minutes after stream start).
  
  Thank you to everyone who worked to make all this possible! ^^
  1. 8
    
    rjzak 1 month ago | link
    
    Wow… Fallout 4 on Linux on an M1. I would never have thought that possible. Incredible!
  2. 4
    
    vaguelytagged 1 month ago | link
    
    ya’ll are awesome! Inspired me to learn rust and get into hardware more!
  3. 3
    
    0x2ba22e11 1 month ago | link
    
    I don’t have anything insightful to add but I’d like to say anyway that this is incredibly impressive.
  4. 2
    
    ocramz 1 month ago | link
    
    love what you Asahi guys are doing! next level stuff
  5. 1
    
    pmarreck 1 month ago | link
    
    Very very nice! I sponsor one of you (sorry, I forget who… probably multiple) on github for a reason!
    
    What’s the priority (if any) on getting these things working on the M3 or the soon-to-be-released M4? (and of course the Pro and Max variants)
    
    The way the blogpost just drops well-known AAA games inline while discussing the latest efforts is hilarious.
    
    Keep on hackin’!
2. 23
  
  acatton 1 month ago | link
  
  When I see the effort some people put, mostly on their spare time, to give people the ability to run extremely closed-source software onto extremely closed-source hardware, with FLOSS in the middle, I’m just amazed… in a good way.
  
  I’m happy that they’re doing it, and I’m not… These people have my absolute respect.
  1. 10
    
    mort edited 1 month ago | link
    
    What makes you call the hardware extremely closed source? I mean all the firmware is closed source of course but that’s the case for pretty much all hardware, what makes Mac hardware unusually closed source? It’s not especially locked down, all the mechanisms which are used to install and boot different operating systems are mechanisms intentionally included by Apple after all, it’s not like Asahi relies on some kind of jailbreaking
    1. 3
      
      acatton edited 1 month ago | link
      
      This might be ignorance from my part, I’m happy to be corrected.
      
      But I’ve always assumed that Mac didn’t use any standard BIOS, and the new ones were using some non-standard UEFI boot process. My understanding was that, while it’s true that their custom firmware allow you to boot another OS, this is something your OS has to put effort into supporting. This is because Apple don’t care about you, as opposed to Intel/AMD/Asus/… who actively contribute to the Linux kernel for their hardware support. My understanding, as well, is that there is very little free documentation available on how to support Apple’s hardware, hence my “extremely closed source”.
      
      Why else would we need the efforts of the Asahi team, if the hardware (which, I admit, I used as a metonymy for hardware+firmware) was standard?
      1. 8
        
        pmdj edited 1 month ago | link
        
        The firmware and bootloader become fairly uninteresting once you’ve successfully booted the operating system, and while there are lots of fiddly bits to take care of to get things running smoothly, once you’re booted, you’re booted. (usually) Booting is not usually a herculean effort, as long as the hardware and firmware vendors haven’t tried to actively block you. Which Apple hasn’t, apparently. (Assuming secure boot is disabled.)
        
        That said, Apple’s GPUs are entirely undocumented, so building a high-quality GPU driver for these things purely based on reverse-engineering (intercepting and tracing what I/O the macOS GPU driver performs on the GPU), especially in this kind of time frame is beyond impressive.
        
        Other devices and SoC subsystems are also mostly undocumented of course and need their own drivers but the complexity of most devices is not in the same league as a GPU; USB is almost certainly a standard implementation though (perhaps with some smaller hacks) so that’s one big driver stack they won’t have to have reimplemented.
        
        3
        
        valpackett 1 month ago | link
        
        USB is almost certainly a standard implementation though
        
        Correct! Most Apple peripherals are custom (though some have fun heritage like the Samsung UART, lol) but core USB is the same Synopsys DesignWare block you see in nearly everything (that’s not Intel nor AMD) these days.
        
        5
        
        lina 1 month ago | link
        
        That’s just the host/device controller though. To make it work you also need a USB PHY driver (Apple custom), a USB-PD controller driver (Partly incompatible Apple variant of an existing TI chip), and the I2C controller driver for the bus that chip talks through (which is actually a variant of an old PASemi controller).
        
        2
        
        valpackett 1 month ago | link
        
        Yeah, by “core” USB I meant the host/device controller :)
        
        Frankly this is exactly why I like ACPI. With ACPI, all that glue-ish stuff around a generic core interface like XHCI mostly/hopefully can be written once in AML, instead of having to write and maintain drivers in each kernel. Which, yeah, from a Linux-centric perspective that last part is not a feature, but I personally like the BSDs for example, and I also hope that “the future” will be more Redox-shaped or something, so I’m always biased in favor of things that make OS diversity easier!
        
        2
        
        tonyarkles 1 month ago | link
        
        Heh, I was working on some low-level USB code on a microcontroller for a client. I’m staring the code and having this moment of deja vu “weird… I’ve never used this brand of chip before but for some reason a lot of this feels very familiar.”
        
        I had worked on USB code before on different chips though. I opened up the datasheets from an old project and this new project and started looking at the USB registers. Sure enough, exactly the same names, exactly the same bit layout, exactly the same memory addresses. Google some of the names and find data sheets from multiple vendors and then find one from Synopsis. Sure helped debug things when I had my own reference implementation to compare against!
        
        3
        
        olliej 1 month ago | link
        
        Which Apple hasn’t, apparently
        
        Booting other OS’s is intentionally supported, and doing so without compromising security for people who aren’t doing so is a significant amount of work. There’s a bunch of stuff describing how non-apple OS’s are installed without trivially undermining secure boot, and I think back in the early days of Asahi marcan talked about it a bunch.
3. 6
  
  mc680x0 1 month ago | link
  
  This might mean that Fallout 4 runs better on Asahi Linux than it does on a modern Windows 11 machine.
  
  To quote the Chalkeaters, “The Creation Engine’s aging very well”.
  
  I’m interested to see how this progresses, as with AArch64 getting more popular, letting Wine run binaries across platforms (and page sizes) is going to become more and more important, and the last time someone really tried it was for ppc-macos with Darwine.
  1. 7
    
    david_chisnall 1 month ago | link
    
    This might mean that Fallout 4 runs better on Asahi Linux than it does on a modern Windows 11 machine.
    
    In my experience, older games tend to work better under WINE on macOS or Linux than they do under newer versions of Windows.
4. 4
  
  yshui 1 month ago | link
  
  Interesting that virtualization was chosen as the solution to the page size mismatch problem. Instead of, say, adding 16k page support to relevant tools/programs.
  
  I looked around and found this: https://github.com/FEX-Emu/FEX/issues/1921. So I guess the reason was 16k page support would be too hacky?
  1. 31
    
    lina edited 1 month ago | link
    
    You can’t add 16K support to proprietary software that was already compiled for x86 and assumes 4K pages, which is what we’re trying to run ^^;;
    
    You can hack around it in some cases (that’s what box64 does, and what was suggested in that issue), but there is no generally compatible solution other than invasive MMU emulation (extremely slow), 4K process support in Linux (very hard and invasive), or just running 4K kernels on bare metal (there’s a FAQ entry about why we didn’t pursue that further).
    1. 3
      
      olliej 1 month ago | link
      
      Another super fun bit you run into is even software that intentionally tries to do the right thing - by using getpagesize() are all screwed because it is typically just a macro that expands to a constant. In other words even though the source looks like it is doing the right thing, the compiled code was just 4096.
5. 3
  
  xyproto 1 month ago | link
  
  Impressive work, Asahi people!
6. 2
  
  WilhelmVonWeiner 1 month ago | link
  
  Installed nixos-apple-silicon on my MacBook Pro M2 just last night, chuffed to see this. Seeing as nixos-apple-silicon is mostly Asahi Linux without the Fedora parts, I’ll be excited to utilise this in the future (until I return this work laptop I guess).
7. 1
  
  metahost 1 month ago | link
  
  I wish there was some way to dual boot Asahi off of an external drive.
8. 1
  
  owl 1 month ago | link
  
  It makes me happy that someone is interested in making these devices useful for more people, and extending their lifespans, when the manufacturer designates them “vintage” 5 years after they stop selling them, and “obsolete” after 7 years.
  1. 1
    
    gerikson 1 month ago | link
    
    I have never seen Apple designate hardware they have sold in the past as either “vintage” or “obsolete”.
    1. 2
      
      calvin 1 month ago | link
      
      They do, but it’s a support thing, and unless you work at an Apple Store/AASP, it probably doesn’t matter much to you.
      1. 1
        
        gerikson 1 month ago | link
        
        Thanks for the clarification! I stand corrected.
9. 1
  
  giffengrabber 1 month ago | link
  
  When I’ve tried emulating X86-64 on Apple Silicon using QEMU it’s been incredibly slow, like doing ls took like 1-2 seconds. So if these fine people manage to emulate games then I’m very impressed!
  1. 29
    
    lina 1 month ago | link
    
    QEMU emulation (TCG) is very slow! Its virtue is that it can run anything on anything, but it’s not useful for productivity or gaming. I used to use it to hack around a FEX RootFS as root, and even just downloading and installing packages with dnf was excruciatingly slow.
    
    Emulators that optimize for performance (such as FEX, box64, and Rosetta, and basically every modern game console emulator too) are in a very different league. Of course, the tradeoff is they only support very specific architecture combinations.
  2. 13
    david_chisnall 1 month ago | link
    
    As @lina says, QEMU is general. It works a few instructions at a time, generates an IR (TGIR, which was originally designed for TCC, which was originally an IOCC entry), does a small amount of optimisation, and emits the result.
    
    Rosetta 2 works on much larger units but, more importantly, AArch64 was designed to support x86 emulation and it can avoid the intermediate representation entirely. Most x86-64 instructions are mapped to 1-2 instructions. The x86-64 register file is mapped into 16 of the AArch64 registers, with the rest used for emulator state.
    
    Apple has a few additional features that make it easier:
    
    They use some of the reserved bits in the flags register for x86-compatible flags emulation.
    
    They implement a TSO mode, which automatically sets the fence bits on loads and stores.
    
    FEX doesn’t (I think) take advantage of these (or possible does but only on Apple hardware?), but even without them it’s quite easy (as in, it’s a lot of engineering work, but each bit of it is easy) to translate x86-64 binaries to AArch64. Arm got a few things wrong but both Apple and Microsoft gave a lot of feedback and newer AArch64 revisions have a bunch of extensions that make Rosetta 2-style emulation easy.
    
    RISC-V’s decision to not have a flags register would make this much harder.
    1. 14
      
      lina 1 month ago | link
      
      There are two more hardware features: SSE denormal handling (FTZ/DAZ) and a change in SIMD vector handling. Those are standardized as FEAT_AFP in newer ARM architectures, but Apple doesn’t implement the standard version yet. The nonstandard Apple version is not usable in FEX due to a technicality in how they implemented it (they made the switch privileged and global, while FEX needs to be able to switch between modes efficiently, unlike Rosetta, and calling into the kernel would be too slow).
      
      FEX does use TSO mode on Apple hardware though, that’s by far the biggest win and something you can’t just emulate performantly if the hardware doesn’t support it. Replacing all the loads/stores with synchronized ones is both slower and also less flexible (fewer addressing modes) so it ends up requiring more instructions too.
    2. 2
      
      xoranth 1 month ago | link
      
      them it’s quite easy […] to translate x86-64 binaries to AArch64 […] RISC-V’s decision to not have a flags register would make this much harder.
      
      Dumb question: is there a reason not to always ahead-of-time compile to the native arch anyway? (i believe that is what RPCS3 does, see the LLVM recompiler option).
      1. 2
        
        david_chisnall 1 month ago | link
        
        As I understand it, that’s more or less what Rosetta 2 does: it hooks into mmap calls and binary translates libraries as they’re loaded. The fact that the mapping is simple means that this can be done with very low latency. It has a separate mode for JIT compilers that works more incrementally. I’m impressed by how well the latter works: the Xilinx tools are Linux Java programs (linked to a bunch of native libraries) and they work very well in Rosetta on macOS, in a Linux VM.
        
        The Dynamo Rio work 20 or so years ago showed that JITs can do better by taking advantage of execution patterns. VirtualPC for Mac did this kind of thing to avoid the need to calculate flags (which were more expensive on PowerPC) when they weren’t used. In contrast, Apple Silicon simp,y makes it sufficiently cheap to calculate the flags that this is not needed.
      2. 2
        
        olliej 1 month ago | link
        
        Rosetta does do this, but you have to support runtime code generation (that has to be able to interact with AOT generated code) at minimum because of JITs (though ideally an JIT implementation should check to see if it is being translated and not JIT), but also if you don’t support JIT translating you can get a huge latency spike/pause when a new library is loaded.
        
        So no matter what you always have to support some degree of runtime codegen/translation, so it’s just a question of can you get enough of a win from an AOT as well as the runtime codegen to justify the additional complexity.