Co-dfns: High-performance, Reliable, and Parallel APL

16

Co-dfns: High-performance, Reliable, and Parallel APL apl github.com/co-dfns
via bienjensu 1 year ago (hidden by 3 users) | caches
Archive.org Archive.today Ghostarchive
| 10 comments

1. 6
  ocramz 1 year ago
  “Readme is a bit sparse, I suppose I’ll look at the tests to understand what this is about.”
  
  The tests:
  
  ∆0_TEST←{#.UT.expect←0 ⋄ _←#.⎕EX'c0002' ⋄ #.c0002←'t0002'#.codfns.Fix ⎕SRC #.t0002 ⋄ 0} ∆1_TEST←{#.UT.expect←1 1⍴'F' ⋄ #.c0002.⎕NL 3} ∆2_TEST←{#.UT.expect←0 ⋄ ⍎'#.c0002.F⍬ ⋄ 0'} ∆3_TEST←{#.UT.expect←,¨0 0 ⋄ _←#.⎕EX¨n←'c0002' 't0002' ⋄ #.⎕NC¨n}
  1. 5
    
    PuercoPop 1 year ago
    
    You can read Aaron Hsu’s thesis to understand the codebase better. The author has said their thesis is the Literate Programming version of co-dfns.
    
    Especially section 3, which explains how are trees represented and why. Even if you wont necessary use the same representation in a more mainstream language it shows some important considerations for writing brachless, parallelizable code.
    
    https://scholarworks.iu.edu/dspace/items/3ab772c9-92c9-4f59-bd95-40aff99e8c7a
  2. 5
    
    doug-moen 1 year ago
    
    It’s an APL compiler that generates GPU code. The compiler is written in the same APL dialect, so the compiler itself runs on a GPU. I don’t know of any other example of a compiler that runs on a GPU.
    
    The entire compiler is about 750 lines of code, as I recall. Each optimization pass is a line or two of code, and the full optimizer fits on a single page without scrolling. So you can see all of that code at the same time, and this is very valuable for refactoring.
    
    It is an impressive feat of engineering.
  3. 3
    
    bienjensu 1 year ago
    
    It gets significantly harder to read than that, although APLers (myself included) have a slightly different notion of readability.
    
    If you’re genuinely looking for more information, the two Dyalog user meeting videos under ‘publications’ flesh things out a bit more.
    1. 1
      
      ocramz 1 year ago
      
      also, is the rectangle an actual APL symbol or does github break the source code by defaulting to UTF-8?
      1. 4
        
        lorddimwit 1 year ago
        
        The rectangle is an actual symbol; it prefixes system names.
      2. 4
        
        bienjensu edited 1 year ago
        
        ⎕ is called Quad, and yes it’s used in APL! When used by itself, it’s something like standard out.
        
        ⎕ ← 1 + (1 2 3) 2 3 4
        
        It’s not very interesting as such otherwise, or even a bit messy, it’s where all the system functions and variables go, so called quad names.
        
        ⎕EX, for instance, is expunge. It tries to erase a name from the scope, and returns 1 if successful, 0 otherwise.
        
        ⎕NL, on the other hand, lists all the names of all the things you specify the name class of.
        
        So yeah, ⎕. Not to be confused with ⎕IO which specifies whether your APL is 0 or 1 indexed :).
        
        EDIT: I should add, some of this information is Dyalog specific.
2. 1
  
  vlnn 1 year ago
  
  My problem with co-dfns is that it’s usefulness is well-hidden under the layer of non-trivial syntax and absence of performance metrics. APL itself is quite performant, reliable and supports parallelism for around 8 years I guess? So I just can’t see why co-dfns are better than pure APL or even “usual” dfns.
  1. 1
    
    doug-moen 1 year ago
    
    Dyalog APL supports parallelism on multi-core CPUs. But the most demanding parallel workloads (like machine learning) require a GPU or TPU. All the papers I’ve seen about machine learning in APL are either using Co-dfns, or some other toolchain that compiles APL into GPU code. These APL to GPU compilers are only supporting a subset of APL, not the full language (what you call “pure APL”).
    
    Co-dfns can produce both CPU and GPU code. This paper contains a benchmark (Table 3) comparing Co-dfns on CPU and GPU, and also comparing to Dyalog APL and Pytorch. In this benchmark, Co-dfns on GPU is 10 times faster than Dyalog APL, but it still loses to Pytorch.
    1. 1
      
      vlnn 1 year ago
      
      That’s a great example, thanks! The compilation to GPU is of course a useful application. Also I’m really impressed with 2-3 times faster results on CPU.