1. 16
    Co-dfns: High-performance, Reliable, and Parallel APL apl github.com/co-dfns
    1. 6

      “Readme is a bit sparse, I suppose I’ll look at the tests to understand what this is about.”

      The tests:

       ∆0_TEST←{#.UT.expect←0 ⋄ _←#.⎕EX'c0002' ⋄ #.c0002←'t0002'#.codfns.Fix ⎕SRC #.t0002 ⋄ 0}
       ∆1_TEST←{#.UT.expect←1 1⍴'F' ⋄ #.c0002.⎕NL 3}
       ∆2_TEST←{#.UT.expect←0 ⋄ ⍎'#.c0002.F⍬ ⋄ 0'}
       ∆3_TEST←{#.UT.expect←,¨0 0 ⋄ _←#.⎕EX¨n←'c0002' 't0002' ⋄ #.⎕NC¨n}
      
      1. 5

        You can read Aaron Hsu’s thesis to understand the codebase better. The author has said their thesis is the Literate Programming version of co-dfns.

        Especially section 3, which explains how are trees represented and why. Even if you wont necessary use the same representation in a more mainstream language it shows some important considerations for writing brachless, parallelizable code.

        https://scholarworks.iu.edu/dspace/items/3ab772c9-92c9-4f59-bd95-40aff99e8c7a

        1. 5

          It’s an APL compiler that generates GPU code. The compiler is written in the same APL dialect, so the compiler itself runs on a GPU. I don’t know of any other example of a compiler that runs on a GPU.

          The entire compiler is about 750 lines of code, as I recall. Each optimization pass is a line or two of code, and the full optimizer fits on a single page without scrolling. So you can see all of that code at the same time, and this is very valuable for refactoring.

          It is an impressive feat of engineering.

          1. 3

            It gets significantly harder to read than that, although APLers (myself included) have a slightly different notion of readability.

            If you’re genuinely looking for more information, the two Dyalog user meeting videos under ‘publications’ flesh things out a bit more.

            1. 1

              also, is the rectangle an actual APL symbol or does github break the source code by defaulting to UTF-8?

              1. 4

                The rectangle is an actual symbol; it prefixes system names.

                1. 4

                  ⎕ is called Quad, and yes it’s used in APL! When used by itself, it’s something like standard out.

                  ⎕ ← 1 + (1 2 3)
                  2 3 4
                  

                  It’s not very interesting as such otherwise, or even a bit messy, it’s where all the system functions and variables go, so called quad names.

                  ⎕EX, for instance, is expunge. It tries to erase a name from the scope, and returns 1 if successful, 0 otherwise.

                  ⎕NL, on the other hand, lists all the names of all the things you specify the name class of.

                  So yeah, ⎕. Not to be confused with ⎕IO which specifies whether your APL is 0 or 1 indexed :).

                  EDIT: I should add, some of this information is Dyalog specific.

            2. 1

              My problem with co-dfns is that it’s usefulness is well-hidden under the layer of non-trivial syntax and absence of performance metrics. APL itself is quite performant, reliable and supports parallelism for around 8 years I guess? So I just can’t see why co-dfns are better than pure APL or even “usual” dfns.

              1. 1

                Dyalog APL supports parallelism on multi-core CPUs. But the most demanding parallel workloads (like machine learning) require a GPU or TPU. All the papers I’ve seen about machine learning in APL are either using Co-dfns, or some other toolchain that compiles APL into GPU code. These APL to GPU compilers are only supporting a subset of APL, not the full language (what you call “pure APL”).

                Co-dfns can produce both CPU and GPU code. This paper contains a benchmark (Table 3) comparing Co-dfns on CPU and GPU, and also comparing to Dyalog APL and Pytorch. In this benchmark, Co-dfns on GPU is 10 times faster than Dyalog APL, but it still loses to Pytorch.

                1. 1

                  That’s a great example, thanks! The compilation to GPU is of course a useful application. Also I’m really impressed with 2-3 times faster results on CPU.