I’ve thought about this stuff a lot, having languages with the right abstractions.
I’ve found that array slices are pretty excellent for these abstractions, but I have yet to find a language that has nice syntax for trees. For example, I would love an AST version of awk for refactoring.
The closest I’ve come to this is… maybe lenses in Haskell or Clojure’s Specter.
There’s so much concise stuff out there that’s easier to just do in Excel than in my favorite programming languages, and it makes me pretty sad!
I would love an AST version of awk for refactoring.
At the risk of sounding flippant, isn’t that the core idea behind lisp’s macros? I might be misunderstanding you, but I always assumed s-expressions were designed that way in order to mirror ASTs as closely as possible.
What about pattern matches in Erlang/Elixir for AST changes(assuming one can get an AST of the language in question)? Part of the difficulty with AST changes for a lot of software is that the AST is usually very complicated compared to a table.
XPath or jq seem like the closest notions to something awk-like for trees.
I have some ideas for both of these things in Oil (but I have no idea when/if I’ll actually have time for them).
As far as tables, R is far and away the best language for dealing with tables. (Although many people don’t know much about R, its predecessor S was developed at Bell Labs, as Unix was). It’s basically like SQL, but more general and with a better syntax:
As far as trees and awk, I’ve been thinking about generalizing Awk [1] in a couple ways. This post [2] talks about generalizing it to file system metadata, in the style of find.
But you can also think about generalizing it to trees, in particular a lossless syntax tree [3]. I actually thought of that as a “token-aware” or “tree-aware” sed, but actually I think sed is mostly a special case of the predicate/action model of awk too.
As far as trees and awk, I’ve been thinking about generalizing Awk [1] in a couple ways. This post [2] talks about generalizing it to file system metadata, in the style of find.
There’s also an extension to GNU awk that traverses XML, that may be worth a look for ideas.
Microsoft’s “language integrated query” (LINQ) made some effort to unify operations in memory with DB operations as the “collection protocol convergence” diagram is referring to. (The linked introduction is from 2007, five years after the archived site linked here.) An expression like from row in collection where row.val1 > 3 select row.val2 can run on an in-heap collection or be munged into SQL by code that can see the expression syntax trees. Besides being a DB interface it’s sort of a clever backdoor way to introduce something like Python’s list and generator comprehensions to .NET with a syntax that SQL users will find roughly familiar.
I don’t know how LINQ is doing, but it is kind of funny the huge differences in language and mindset that exist between on-heap collections and anything else. There are real, inherent differences that stem from DBs being persistent, remote, and accessed by multiple clients concurrently, but [edited] it’s not as clear that, for instance, a more declarative syntax and external memory data structures/algorithms couldn’t be worthwhile to have easily accessible for regular old data inside your program. Arguably the starting point could be “data is data” and then you justify the differences.
Right now when you move data to a DB you’re accessing it through a sometimes-clunky wrapper, and you’ve entered a world where an O(n^2) approach can look exactly like an O(n) one (and the difference is just whether some index exists) and the database happily accepts either. Conversely, some stuff that’s trivial in an RDBMS requires more fuss for data in the heap: even just intersecting two big sets I am, in effect, manually writing a query plan (by picking a data structure/algo), and it can be fussy to maintain a secondary index or spill a big job to disk the right way.
To some extent I’m going over well-worn stuff about the impedance mismatch, and I don’t pretend to be clever enough to work out some great new approach to all of this. A database is truly different from the heap, obviously, and, for predictability, you still need to have available the fully imperative approach where every operation is as explicit as is practical. But I can see the appeal of more of a continuum where right now it tends to look (from here) like a few points with big empty spaces between them.
LINQ seems to be doing all right in my experience, though I don’t consider it the best way to access SQL DBs, since the SQL generation is just another layer one needs to debug for Select N+1 problems.
IIRC, I don’t think most DBs have O(n) operations involving more than one table. At best, one gets O(nlogn).
It would be interesting to a persistent, table-based data store that works well with in memory data, a la SQLite, but built for C# or Java, though I’m not sure if the CLR or JVM are runtimes that would work for mixing app and db code in that way.
Yeah, and there’s definitely a risk of boiling the ocean trying to fit different scales and use cases into one mold. It’s just funny everyone seemed to agree that collections and some basic algorithms should be provided to everyone, but ask for just a bit more and suddenly you’re sending string queries over a network (or at least writing very different code).
There are some exceptions to this. Go has a few embedded k/v stores that you can find on github, BoltDB is one that I’ve looked at a bit, but there are others. Elixir/Erlang have ETS/DETS and Mnesia. Octave/Matlab have the ability to save/load data pretty easily, IIRC. I know that k/q has a pretty notable built in data store.
I suspect that part of the reason one sees a much stronger client/server separation in Java/C# applications by default is the error model for those languages making it tricky to keep isolate client/app errors from taking down the db.
Anyone here have experience with the xBase languages enough to be able to say if what the article mentions was an actual benefit? I’ve not heard great things about the various dBASE derivatives, (FoxPro, Clipper, and so on).
On the other hand, I’ve had an idea kicking around in the back of my head around Building A Better Spreadsheet, as it were, or if nothing else, building a spreadsheet/table oriented software for the modern era that isn’t tied into MS or Google. These thoughts haven’t materialized yet, but I wonder.
Does anyone have a mirror? I’m getting a 403 forbidden error.
http://web.archive.org/web/20070311144012/geocities.com/tablizer/top.htm
[Comment removed by author]
I’ve thought about this stuff a lot, having languages with the right abstractions.
I’ve found that array slices are pretty excellent for these abstractions, but I have yet to find a language that has nice syntax for trees. For example, I would love an AST version of awk for refactoring.
The closest I’ve come to this is… maybe lenses in Haskell or Clojure’s Specter.
There’s so much concise stuff out there that’s easier to just do in Excel than in my favorite programming languages, and it makes me pretty sad!
At the risk of sounding flippant, isn’t that the core idea behind lisp’s macros? I might be misunderstanding you, but I always assumed s-expressions were designed that way in order to mirror ASTs as closely as possible.
What about pattern matches in Erlang/Elixir for AST changes(assuming one can get an AST of the language in question)? Part of the difficulty with AST changes for a lot of software is that the AST is usually very complicated compared to a table.
XPath or jq seem like the closest notions to something awk-like for trees.
I have some ideas for both of these things in Oil (but I have no idea when/if I’ll actually have time for them).
As far as tables, R is far and away the best language for dealing with tables. (Although many people don’t know much about R, its predecessor S was developed at Bell Labs, as Unix was). It’s basically like SQL, but more general and with a better syntax:
https://github.com/oilshell/oil/wiki/Oil-and-the-R-Language
As far as trees and awk, I’ve been thinking about generalizing Awk [1] in a couple ways. This post [2] talks about generalizing it to file system metadata, in the style of
find
.But you can also think about generalizing it to trees, in particular a lossless syntax tree [3]. I actually thought of that as a “token-aware” or “tree-aware” sed, but actually I think sed is mostly a special case of the predicate/action model of awk too.
[1] http://www.oilshell.org/blog/2016/11/14.html
[2] https://lobste.rs/s/jfarwh/find_is_beautiful_tool#c_rkmlpz
[3] https://github.com/oilshell/oil/wiki/Lossless-Syntax-Tree-Pattern (see Hacker News link at top)
There’s also an extension to GNU awk that traverses XML, that may be worth a look for ideas.
Microsoft’s “language integrated query” (LINQ) made some effort to unify operations in memory with DB operations as the “collection protocol convergence” diagram is referring to. (The linked introduction is from 2007, five years after the archived site linked here.) An expression like
from row in collection where row.val1 > 3 select row.val2
can run on an in-heap collection or be munged into SQL by code that can see the expression syntax trees. Besides being a DB interface it’s sort of a clever backdoor way to introduce something like Python’s list and generator comprehensions to .NET with a syntax that SQL users will find roughly familiar.I don’t know how LINQ is doing, but it is kind of funny the huge differences in language and mindset that exist between on-heap collections and anything else. There are real, inherent differences that stem from DBs being persistent, remote, and accessed by multiple clients concurrently, but [edited] it’s not as clear that, for instance, a more declarative syntax and external memory data structures/algorithms couldn’t be worthwhile to have easily accessible for regular old data inside your program. Arguably the starting point could be “data is data” and then you justify the differences.
Right now when you move data to a DB you’re accessing it through a sometimes-clunky wrapper, and you’ve entered a world where an O(n^2) approach can look exactly like an O(n) one (and the difference is just whether some index exists) and the database happily accepts either. Conversely, some stuff that’s trivial in an RDBMS requires more fuss for data in the heap: even just intersecting two big sets I am, in effect, manually writing a query plan (by picking a data structure/algo), and it can be fussy to maintain a secondary index or spill a big job to disk the right way.
To some extent I’m going over well-worn stuff about the impedance mismatch, and I don’t pretend to be clever enough to work out some great new approach to all of this. A database is truly different from the heap, obviously, and, for predictability, you still need to have available the fully imperative approach where every operation is as explicit as is practical. But I can see the appeal of more of a continuum where right now it tends to look (from here) like a few points with big empty spaces between them.
LINQ seems to be doing all right in my experience, though I don’t consider it the best way to access SQL DBs, since the SQL generation is just another layer one needs to debug for Select N+1 problems.
IIRC, I don’t think most DBs have O(n) operations involving more than one table. At best, one gets O(nlogn).
It would be interesting to a persistent, table-based data store that works well with in memory data, a la SQLite, but built for C# or Java, though I’m not sure if the CLR or JVM are runtimes that would work for mixing app and db code in that way.
Yeah, and there’s definitely a risk of boiling the ocean trying to fit different scales and use cases into one mold. It’s just funny everyone seemed to agree that collections and some basic algorithms should be provided to everyone, but ask for just a bit more and suddenly you’re sending string queries over a network (or at least writing very different code).
There are some exceptions to this. Go has a few embedded k/v stores that you can find on github, BoltDB is one that I’ve looked at a bit, but there are others. Elixir/Erlang have ETS/DETS and Mnesia. Octave/Matlab have the ability to save/load data pretty easily, IIRC. I know that k/q has a pretty notable built in data store.
I suspect that part of the reason one sees a much stronger client/server separation in Java/C# applications by default is the error model for those languages making it tricky to keep isolate client/app errors from taking down the db.
This reminds me a bit of Out of the Tar Pit, but without the functional programming aspect that they get to in that paper.
It also reminds me a bit of Eve which seems like a much more modern take on the idea.
Anyone here have experience with the xBase languages enough to be able to say if what the article mentions was an actual benefit? I’ve not heard great things about the various dBASE derivatives, (FoxPro, Clipper, and so on).
On the other hand, I’ve had an idea kicking around in the back of my head around Building A Better Spreadsheet, as it were, or if nothing else, building a spreadsheet/table oriented software for the modern era that isn’t tied into MS or Google. These thoughts haven’t materialized yet, but I wonder.
[Comment removed by author]