1. 19
    1. 5

      I’d kinda prefer something with a REPL, but that seems to be uncommon for datalog for some reason.

      I like to use SQL the way some people use spreadsheets: load some data, do some calculations, maybe add some validation to automatically check for outliers/data errors/misunderstandings (for example, the 6502 instruction set as a database). But SQL as a model has its own quirks and oddities, and people keep talking about how Datalog is much more flexible and powerful, so I’m interested in trying it out.

      I hoped/expected there would be a Datalog tool something like SQLite - load data and validation rules into a file on disk that you can query interactively, do one-shot queries from a shell-script, or use as a data-store from inside some other language. Unfortunately, it seems that’s not a use-case anyone else seems to be interested in - as you point out, everybody seems to be interested in libraries for in-memory data stores, or other patterns that don’t seem relevant to my use-case.

      1. 5

        I wouldn’t be surprised if a SQLite-ish Datalog tool existed, but I couldn’t find one. That said, I certainly didn’t do an exhaustive search. Would be neat to build one…

        Edit: I wonder if this is due to evaluation strategies for Datalog usually being bottom-up rather than top-down? In Prolog you write “find me variables where this rule is true” and it starts with that rule and walks down the call tree. Datalog seems to be more about giving it a bunch of data and then saying “find me all rules that are true”, so it starts at the leaves of the rule graph and walks upwards until something is false. It’s not impossible to turn one into the other, but it might be more work.

    2. 3

      Neat. It is a type checker, not an evaluator. However, since STLC programs always terminate, it’d be interesting to see an encoding of evaluation in Datalog too!

      1. 2

        It would, but my intent is to move on to typechecking more complicated type systems as well, so evaluation isn’t my priority right now. Sorry!

        1. 1

          No worries! I just noticed the connection. Future work :⁠-⁠)

    3. 3

      This sounds fun.

      • I’ve written small type-checkers or interpreters in Lambda-Prolog, which is a superset of Prolog (whereas Datalog is more of a subset) that also provides builtin support for variable bindings (in the object language, the language/programs being manipulated). Elpi is a lambda-prolog implementation that is used for metaprogramming Coq/Rocq terms, so we know that this approach scales to large, complex type systems.

      • “Now, the thing is that datalog, or at least crepe, doesn’t really handle nested structures nicely.” Urgh. Frankly I think that this makes things a little ugly and a little more complex (not that much) for no good reason, and the performance justifications are more like Stockholm syndrome speaking. (Performance justifications that seem rather hypothetical, anyway, given that the first thing you do with your nicely packed AST is to copy each of them separately as a separate Datalog fact, turning every subterm access into a fact lookup.)

      • For now you don’t support explaining where type-errors are in the program, and that sounds like a dealbreaker from a distance. Maybe it is easy to add some form of type-error reporting (not asking for glorious error messages, but at least: which subterm were you checking when you failed?), or maybe this while require a global change to the structure of the TypeOf predicate. You would probably want to know which it is before you apply that technique in a setting where at least one other person will use the type-checker.

      1. 1

        Frankly I think that this makes things a little ugly and a little more complex (not that much) for no good reason…

        Possibly! My vague intent if this becomes something more than a toy would be to hand-write the basic typing rules, and then write code to generate the rest of the datalog to evaluate the input program. So ugliness and complexity doesn’t bother me that much, as long as it remains easy to generate. So I’m kinda thinking of datalog as a target language to generate and asking “is this even possible?”.

        No idea how the performance argument actually works out in practice, but it would be interesting to compare someday. Have a somewhat-non-trivial datalog program with a bunch of actual data, compare it to the same literal program running in a Prolog engine, then rewrite it in nicer Prolog and compare perf again.

        For now you don’t support explaining where type-errors are in the program, and that sounds like a dealbreaker from a distance.

        Big heckin’ yes. No idea how to even ask that question in Datalog currently, since by nature it’s difficult to ask “where” something happens. It miiiight be doable if you phrase it as something like “find me the expression e where e‘s children typecheck but e itself does not”. Alternatively if you’re writing a Datalog engine yourself for a compiler, maybe you can instrument it so you can say “tell me more about how you evaluated rule TypeError(expr)”.

        Edit: Some playing around shows that this can work but it’s still a bit tricky to ask “what type did you expect?” You can write special-case rules for particular types of failures though, like “tried to call a value that isn’t a function” and have a fallback to OCaml-style “expected X, got Y” messages, but it seems like it’d be a lot of work and kinda tricky to collect the information you want. On the other hand… when is that not true for error reporting?

        Mainly for my own memory, you can write “function call has arg type X but was given type Y” like so:

        BadApply(id, expected_type, got_type) <-
          EApply(id, f, arg), // Look for a function call (apply f arg)
          TypeError(id), // that produces a type error
          TypeOf(f, funtype), // where the type of f is "funtype"
          FunctionType(funtype, expected_type, _) // where funtype is a function with an arg of type "expected_type"
          TypeOf(arg, got_type); // and the type of the argument is "got_type"