1. 51
    1. 13

      Author here. AMA.

      1. 10

        Two questions: First – RPCs, in my experience, aren’t free, but they’re not so expesive locally. OpenBSD uses them reguarly within their daemons to implement privsep with process isolation, and 9front uses RPC ubiquitously. So, do you have analysis of where the time goes and what kind of RPCs are done in typical user interactions? The new archtiecture makes sense, I’m just surprised that the overhead of the RPCs registers at all for human scale interactions in the old model.

        Second: The old accessibility architecture was put in place by a relatively large and well funded team at Sun. Who is funding this work, and who will be porting third party applications, testing, and doing all of the boring work to keep things up to date once the initial architecture and proof of concept is shipped? What’s the funding model?

        1. 10

          First: The impact of RPC overhead on accessibility really is known to be significant. I don’t know if anyone has published an analysis of this in GNOME. But I used to be on the Windows accessibility team at Microsoft, and RPC overhead was a big problem for us, one that we eventually went to great lengths to solve (though not using the push approach). Here’s an example that I often used to illustrate the problem: In Narrator (the screen reader built into Windows), when a user wanted to read a chunk of text, Narrator had to do at least one RPC per word, and probably several, before the text was sent to the text-to-speech engine. It has now been a few years since I worked on that code, so I’ve forgotten the details of what all those RPCs were for. The solution we at Microsoft eventually resorted to was to run a custom bytecode interpreter inside the application process (in the UIAutomationCore library) to run small programs sent by the screen reader (or other assistive technology), which could fetch all the necessary information in one RPC. That solution went by the rather generic name “Remote Operations”. There’s some documentation of the core platform API, and the high-level client libraries are open source. More background information is available in the patent (which isn’t mine; I wasn’t the lead developer on that project). You can probably understand why I chose not to reimplement that approach for GNOME.

          Second: There is funding for the work that I’ll be doing on this. I don’t yet have permission to say any more about that.

          1. 2

            I would love to see that data. I presume the issue was that the RPCs are synchronism and so may incur a full context switch. If you have a paragraph of, say, 100 words, you may end up with 100 scheduling events that, with a 10ms quantum, add up to a second of latency. If, instead, those RPCs use a mechanism like Doors or L4 messages, or if they’re asynchronous, I’d expect the overhead to be noise.

            1. 4

              I no longer have access to any of the data that my colleagues at Microsoft collected. What I remember from my own measurements of RPC overhead on Windows, particularly as a third-party screen reader developer before I joined Microsoft, was that each RPC round trip, which did indeed require a context switch, took on the order of 100 microseconds. I haven’t yet measured on Linux, particularly using AT-SPI, which is based on D-Bus.

    2. 11

      The idea of a screenshot that captures the application tree state (e.g. the DOM tree) is really exciting. The internet seems full of screenshots (for whatever reason), and asking users to manually annotate them just doesn’t seem to work in practice.

      1. 7

        In the last update, Windows added automatic OCR to screenshots. OCR works very well for screenshots because there’s no noise from a printer or scanner, but it struck me as an incredibly inefficient way of going from structured text to structured text.

        Apple’s screenshot format for the whole screen is (was?) PDF but they use PNG for shots of individual windows. This always made me a bit sad because Quartz renders using a PDF drawing model and you can get resolution-independent PDFs by just changing the output device. I wish they’d done this for screenshots, so text would be text in generated PDFs.

        1. 4

          This always made me a bit sad because Quartz renders using a PDF drawing model and you can get resolution-independent PDFs by just changing the output device.

          That’s assuming applications aren’t internally composing everything down to pixel buffers before sending them off as PDF. X11 also had a relatively high-level view in what is going on with the screen, but these days it’s just sending around pixel buffers, which informed the Wayland design.

        2. 3

          At least on Sonoma (current) macOS, both full-screen (cmd-shift-3) and window (cmd-shift-5) screenshots are saved as PNG.

          1. 3

            The one place where Apple uses PDF “screenshots” is when you take a screenshot in Safari on iOS and then choose “Full Page”. But that’s probably no different than exporting a page to a PDF, just a different UI.