Balatro is another recent game written in Lua. It clocks in at about 30,000 lines of code (counted with wc -l) not including the localization files. Despite being a commercial game the source code with comments is readable by simply unzipping the Steam distribution. It’s written using the Love2D engine and runs great on many systems. Also a fantastic game.
I learned some Lua, gosh, 15+ years ago to write World of Warcraft addons. It made a lot of sense to me as a lightweight scripting language you’d embed into a bigger compiled game. One thing I like about Lua is that the language is so simple there’s generally only one way to do something which frees you from worrying about “am I using this language right?”
One thing I like about Lua is that the language is so simple there’s generally only one way to do something
Haha; just a while ago I wrote a post about five different ways to do methods in Lua. (well, technically Fennel but they’re the same runtime and all of it applies equally well to Lua)
But the fact that you don’t have to use classes is very much appreciated, as is the flexibility to make trade-offs between reloadability and inspectability/serializability where needed. And granted some of the approaches in my post only make sense in very specific circumstances. (Erlang-style solutions for Erlang-type applications like my IRC server.)
This giant frickin’ switch is what drives most of the game logic. I love it. Sometimes the best games have just the worst code by “cleanness” standards.
O.C.: And what about the lack of classes in Lua?
Actually, I always loved simple solutions and was never a fan of OOP.
Lack of classes is one of my favorite features of Lua.
I just came to it when I got a “a table in the table is not a table but a reference to the table” bug and was trying to fix it. I just set some value using “=” and after that, my table was referenced and not copied, so two parts of the game started editing the same table, and I got some “interesting and unexpected” outcomes.
I don’t think I’ll ever understand the way that programmers coming from C and C-like languages expect data to be copied implicitly. It’s so utterly foreign to me. If you want a copy, make a copy!
But I understand the way that unlearning bad habits from previous languages can be a large part of the work in learning a new language.
Dmitry told me that Lua was created at the Pontifical Catholic University of Rio de Janeiro and that it was acceptable for Pontifical Catholic Universities to design programming languages this way.
I’ve never heard an Ex Cathedra argument applied to programming language design before!
I would use a much more module-based code organization. Because of my current code organization, I was using a lot of “messages” (a way of communication between game objects provided by Defold). I think I shouldn’t rely on them so much in the future.
It takes a while to get used to the way modules work in Lua, but once I learned them, they quickly became one of my favorite language features. However, for a 60kloc codebase I think you still need self-enforced discipline and structure if you rely on modules alone.
I have had great luck using Lua’s (very unique) support for first-class environments as a way to isolate different parts of the codebase from each other. If you break your code apart into multiple environments, you can strictly define the smallest set of functions which are necessary to bridge them. This means that instead of (in my case) one 12kloc game, I had four separate 3kloc codebases to deal with, and I never felt the unstructured flexibility of Lua to be burdensome at that scale. I haven’t seen very many people discuss this as a strategy for dealing with the lack of a type system, but I wrote some about it here: https://technomancy.us/181
I don’t think I’ll ever understand the way that programmers coming from C and C-like languages expect data to be copied implicitly.
This is coming from a rustacean, but I feel like a scripting language with first-class reference types might be very interesting and possibly extremely useful. That way you always know whether something is going to be copied or referenced. Go does this with its pointers, but also makes the Billion Dollar Mistake; any experienced Go people out there with opinions on this?
That way you always know whether something is going to be copied or referenced.
Can you elaborate on what you mean by that? All the scripting languages I know have references for their data structures; you always know something is going to be referenced because that’s the only reasonable way to pass arguments to a function. If you want a copy, you can do that by … making a copy.
Are you saying you should be able to have a data structure and attach behavior to it saying “always copy this implicitly every time it’s passed to a function”? That seems like it would be confusing and error-prone, but maybe I’m misunderstanding you.
If you want it to be easier to make a copy, I think it should be done using a syntactic shorthand at the call site rather than being an inherent property of the data structure.
you always know something is going to be referenced because that’s the only reasonable way to pass arguments to a function
It isn’t though! There’s plenty of time that (small, static) structures are passed by value. Essentially all the time in immutable languages for example (I’ve been messing with Elixir a lot recently), but also in Rust and C. What if everything was pass-by-value by default instead of being by-value for some types and by-reference for others? Basically my goal is to able to explicitly have unique pointer semantics of some kind or another for a scripting language. This isn’t a terribly well-thought-out idea, but maybe we start with all values being immutable, and then have something like:
x1 = {foo: 1, bar: 2} # immutable dictionary
y1 = x.foo # Copy foo
x2 = {foo: ref 1, bar: 2}
y2 = x.foo # Copy foo
z2 = ref x.foo # Copy reference to foo
y2 = "whatever" # rebind variable y2, or this is an error, whichever you prefer
z2 = "rawr" # allowed
print(x2) # prints {foo: ref "rawr", bar: 2}
OCaml does something kinda like this, but it’s kinda squirrelly, so I don’t know that this would really be better or more convenient – OCaml is already kinda squirrelly with mutation anyway. I suppose my goal is to make a scripting language that starts out with “immutable everywhere”, ‘cause if you have that then you really don’t care what is a value and what is a reference, and then allow you to relax that in certain places so you can get the best of both worlds. I’m also imagining maybe a ref type that can be shared and a box type that is mutable but not share-able. Or maybe you start with move semantics no matter what; I don’t know.
Basically Rust’s data model changed my world and I think there’s a lot of room for languages with the most world-changing parts of its data model but a lot more relaxation around Rust’s theme of “never, ever allocate dynamic memory or touch a refcount unless you’re told to”.
Essentially all the time in immutable languages for example
Well sure, but in the case of immutable data that’s an optimization that should not meaningfully affect the semantics of the language. If the runtime or compiler wants to make copies behind my back because it’ll be faster, it’s free to do so as long as I won’t notice.
For mutable data it’s just weird; why would I want the language to make a copy for me without asking, but then for the copy to be mutable too? Every case I can think of where I would in theory want something like that would be better off served by explicitly creating an immutable copy instead, which the language would then be free to optimize however it likes. (I definitely think languages should make this easier, but still never implicit.)
I think the level of confusion that C programmers exhibit when learning languages that don’t have the flaw of implicit argument copying is pretty good evidence that it was a language design mistake we should be leaving in the past. Say what you mean!
Edit: I re-read your example and it sounds like you’re talking about something different since your example doesn’t contain any function calls except print. I’m talking about the implicit copying that happens to arguments in C when a function is called. I don’t see anything objectionable in your example, except I think it’s a bad idea to use the same notation to bind a local variable as you do for mutating a field of a data structure. OCaml has the right idea where it uses = for locals and <- for field mutation.
Probably not so much for C, but C++ objects can have copy constructors and so the implicit behaviour of either copying a reference, shallow copy, or deep copy, is a per-object choice. In C, struct assignment is a shallow copy but you have an explicit differentiation between a struct variable and a pointer to a struct. In Lua (or most other Smalltalk-family languages) there are no value types.
Objective-C is close to Lua/Smalltalk here, though it still uses C structs for value types. Things like NSRange, NSPoint, and so on are all value types that do not contain pointers and so can be safely shallow copied. You can return one from a function and it will not alias anything else. In Lua, you must be careful to create a new object (table) and return it, rather than returning an existing one. In Objective-C, Smalltalk-like objects are referenced only via pointer and so you must explicitly copy them (some classes know that they’re immutable and so will silently tweak refcount rather than copying).
Are they? I thought that they were simply immutable objects, like their Smalltalk equivalents. That’s certainly how they behave.
At the abstract machine level, you have a reference to an immutable string. When you append to a Lua string, you get a reference to a new string that is the result of the append operation. The old string may no longer be reachable and so is collected. Similarly, the number 4 is an immutable object. If you add 1 to 4, then you get a reference to the immutable object that represents the number 5. As an implementation detail, the fact that it’s a reference to the global immutable number 5 is encoded in the word that holds the reference and the underlying object doesn’t actually exist, but at the abstract machine level it’s just a reference to an immutable object.
If I write the following:
local x = 1
x = x + 1
print(x)
The first line is creating a variable x and setting it to a reference to the global immutable object that is a number with a value 1. The second line is calling its + method, which returns a reference to the global immutable number 2. When I print it, I have a reference to an object of type number. As an implementation detail, the reference is NaN boxed and holds the value of the number and the + operation is just implemented as arithmetic on that address.
If I print the type of this number and another object:
print(type(x))
local y = {}
print(type(y))
I get:
number
table
Note that neither of these say reference to table or reference to number, because the fact that it’s a reference is implicit.
Lua’s number is similar to Smalltalk’s SmallInt, which similarly embeds the value in the pointer, but remains an object and has no methods that can mutate the instances.
In Objective-C, we do similar tricks to embed small Number and NSString instances in pointers. At the abstract machine level, you still have a pointer to an immutable NSNumber or NSString subclass, but as an implementation detail we are embedding the value in the pointer. Because the underlying type is immutable, this is a valid transform (indeed, if you do pointer comparison, it will appear as if they are pointers to the same object because they are pointer-sized integers that refer to the same abstract immutable object by encoding the same bit pattern in the pointer).
Value types have mutation operations. I cannot mutate a Lua string or integer, I can just use operations that create new ones. C structs and C# value types support taking reference to them and mutating them, but default to treating them as values.
You can always use value types as an implementation detail to implement something that looks like an immutable value in the type system, but the converse is not true.
The distinction between values and objects that I have usually seen is that objects have a distinct identity and values don’t. For example, value types in Java (project Valhalla) are immutable and have no identity.
Sorry, I should have said ‘Value types can have mutation operations’. Not having mutation operations is not part of their definition, whereas it is for immutable objects. The two concepts are orthogonal.
It looks like I was wrong about references to value types in C#. There’s a fairly rich set of things in C# that blur these lines in interesting ways but not in quite the way I thought.
The distinction between values and objects that I have usually seen is that objects have a distinct identity and values don’t
That’s a reasonable definition, but immutable objects blur that line a lot. Immutable objects may be interned, unless their identity is observable and the language makes guarantees that prevent it. Python does this for small numbers (as did Cocoa before stuffing them in object pointers) and Java can do this for strings (explicitly for dynamic strings, implicitly for literals), but they still have a unique identity, it’s just that you can create another object of the same identity if it happens to have the same value. The same applies to any language that does string interning.
Java made the mistake of requiring the boxed number objects to be unique as a result of having the language guarantee that new T() returned an object that compared not-identical-to every other object. This ensured that new Integer(1) != new Integer(1) was a language-level guarantee, which is surprising to a lot of programmers. Newer versions of the language worked around this by providing mechanisms for constructing Integer instances that were permitted to alias existing ones and used these for auto-boxing. This permitted implementations to embed any Integer instance constructed this way in the pointer (or keep a small table of common integers), while keeping ones that are created with new as real heap allocations.
Defining identity in a language is hard. In Verona, I want to enable the freeze operation (which converts a region into an immutable object graph) to perform interning, but not require it. If object identity is observable, that’s mostly fine, but if object identity is expressible as a number (e.g. an address) then it’s hard because you can build trees using that number or hash tables using that number as the input to a hash function (which may be simple truncation) and then interning breaks your data structures. In C/C++, this kind of pattern in common and leads to some efficient code. In Java or .NET, the equivalent is a non-unique hash code for each object that is typically its address at the first time the method is called, which then sets a flag and is captured if the object is relocated. This is fairly expensive to maintain and is not unique (even for live objects) and so can’t be used to key identity-based data structures. I don’t have a good solution here.
[…] if a value type is mutable, then mutations made to one value are not visible in another
This implies that value types may have mutation operations, whereas immutable objects do not (by definition).
If you have a reference to a value, if that value is mutated, you can observe the mutation through the reference. With an immutable object, you can’t change what 1 is.
I think the take-away is that immutable objects are one kind of values, but not all values are immutable objects (squares and rectangles).
I don’t think I’ll ever understand the way that programmers coming from C and C-like languages expect data to be copied implicitly. It’s so utterly foreign to me. If you want a copy, make a copy!
Well, it depends on how you learn the language and how much they know C/C-likes. In every tutorial I’ve read back then when learning dynamically typed languages, the wording “variables references data” is quite explicit: it’s a pointer, copying the pointer won’t copy the pointed data. So I never had that surprise.
Good Python tutorials even mention this aspect of dynamically typed language when diving into keyword arguments:
def foo(x=[]): ... # don't do this
def foo(x=None): # do this
if x is None:
x = []
...
And they explain that the default value of the keyword argument is evaluated when the function is created, and all you have then is a reference to the same object for each call.
I honestly think that in order to be surprised by the “it’s all references” semantics, you must dive in the language without even reading a single tutorial about it (fun fact, that’s how I learned Erlang, by reading the source code of RabbitMQ, at that time I thought atoms were some form of global variables defined elsewhere and the naming was explicit enough so that I did not need to even know that datatype, silly me).
I honestly think that in order to be surprised by the “it’s all references” semantics, you must dive in the language without even reading a single tutorial about it
Or if you read documentation that wasn’t written by a C programmer and thus the author didn’t even bother to mention things that they thought were self-evident. I went over a decade into my career without learning how weird C is about passing arguments to functions. It never would have occurred to me to explain this in documentation I wrote, because the idea of copies being made implicitly felt nearly incomprehensibly bizarre.
We also believe that the terms “call-by-value” and “call-by-reference” are so hopelessly muddled at this point (between students, instructors, blogs, the Web…) that finding better terminology overall would be helpful.
the author didn’t even bother to mention things that they thought were self-evident
When writing a tutorial for a newbie, you should not assume that anything is self evident, nothing is.
how weird C is about passing arguments to functions.
There is nothing weird about it, it’s just different semantics. “copy and explicit reference” is common in system languages, and in functional programming. “reference and explicit copy” is common in dynamically typed languages.
When writing a tutorial for a newbie, you should not assume that anything is self evident, nothing is.
“Numbers are in base 10 by default. You can use a text editor to write your programs. Whitespace characters are used to separate tokens and are not considered part of identifiers. Calling a non-recursive function once executes the body of the function exactly once. When put a number in a field of a data structure, the language won’t change that number implicitly. The compiler typically expects programs to be stored in files on disk. An if will only execute one of its branches, not both. The compiler will not delete files it is given as input. Creating a new data structure requires memory.” I could go on and on.
Every one of these facts seems as obvious to me as “calling a function does not result in data structures being copied implicitly because calling a function has nothing to do with copying data” seemed ten years ago. Listing out everything you assume to be self-evident is not feasible, and even if it were, no one would read a tutorial like that.
There are environments/tools/languages where this is not the case.
You can use a text editor to write your programs.
Again, this is not self-evident, considering there exist something called “visual scripting”.
Whitespace characters are used to separate tokens and are not considered part of identifiers.
There are languages where identifiers can have whitespaces, so once again, not self evident.
Calling a non-recursive function once executes the body of the function exactly once
I’ve taught many programming courses, and I can assure you I got this very question. Some students would assume that code is executed as soon as it is written, so putting a function in a module would execute it every time you import the module. Not everyone is able to grasp immediately the concepts you manipulate daily for the last 20 years.
The compiler typically expects programs to be stored in files on disk.
I’ve also got this question, students used to IDEs not even understanding what is a compiler, how does it work, or how to use the terminal. Basically they wrote Java classes in their IDE and clicked a button, they had no idea it was stored as files, and that there was a compiler being called. The IDE completely abstracted this part away, which confused them.
An if will only execute one of its branches, not both.
I also got this confusion from some students.
I could go on and on.
Oh I’m sure you can find other examples you are used to. But you seem to forgot that someone new to IT does not have the understanding, experience, mental model that you have. At the beginning, students struggle with control-flow, memory management (even when it’s managed by the compiler or runtime), etc…
So I stand by my claim, to be surprised by a language semantics, one must not have read said semantics. After decades of learning many programming languages, picking one up is faster, and I often tend to go look for the reference/specification, which must be, guess what: exhaustive and explicit.
After decades of learning many programming languages, picking one up is faster, and I often tend to go look for the reference/specification, which must be, guess what: exhaustive and explicit.
I agree the reference should be exhaustive and explicit.
However, your original claim was about tutorials, not reference documentation. I have never seen a tutorial that reads the way you describe.
Lack of classes is one of my favorite features of Lua.
I don’t really understand this. Lua is almost identical to Smalltalk when you ignore the syntax and Smalltalk classes map directly to and from Lua metatables. You can take a Lua program and write the same structure in Smalltalk using classes where the original used metatables or vice versa.
Is it really just the fact that it doesn’t use the word ‘class’ that you like? I found it quite jarring that Lua adopts Smalltalk semantics for almost everything but its own terminology everywhere.
Is it really just the fact that it doesn’t use the word ‘class’ that you like?
No, it’s that I don’t want inheritance to be coupled with encapsulation and polymorphism and other unrelated concepts.
I would rather have them available a la carte so I can use the techniques that are relevant for the problem at hand. Nearly every module in my program needs encapsulation. One or two might need polymorphism. It’s extraordinarily rare that inheritance is useful, but when it does happen, it’s always (IME) in a way that has nothing to do with classes. (the most recent example I can think of is key maps in a text editor; lua mode inherits from programming-mode which inherits from text-mode)
To quote Joe Armstrong: “You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.”
Edit: plus I have enough experience with Ruby to know that if classes are part of the language, then you will see a lot of programmers using them for things they’re not good at. You do actually see this in Lua anyway since of course you can build classes in userspace, but it’s mostly limited to applications and not libraries.
This is a good read. Not religious. Not defensive. It reminds me of so many projects I’ve been on over the years. Pragmatic, here’s where we are, here’s where we’d maybe rather be but reality took us a different route.
I’ve never built anything in Lua (but I’ve hit it many times in its many various incarnations in lots of different systems), and it’s kind of cool to think that someone built a rather large system relying on it to this degree. I’ve built large systems in BASIC derivatives (dynamic type systems and all) in the distant past, and managed large Python projects in the less distant past, and it’s all doable – but as this article explains, it’s a lot easier once you’ve gotten in tune with the language itself. In the case of Lua, it sounds like understanding the table-centric approach is incredibly powerful in this regard.
EDIT: Oh, and if the author or anyone who knows the game is reading this: What’s the 30 second pitch on the game? If I enjoy Factorio, and I going to love this?
I’ve been using Lua for years at this point (so much that it’s the language that taught me programming!) and after many ups and downs it’s one of my favorites when it comes to design.
I’ve no darn clue how the designers managed to show so much restraint as to have one data structure which can do everything you’d ever ask for, and more (tables are really nice for building DSLs, because they have a little bit of syntax sugar - function { table = "argument" } - that goes a long way.)
The one thing that always hurts is the dynamic typing system, which I always felt made it hard to write software you can rely on… but maybe that was just my lack of skill. Because I’ve been recently working on a pure JavaScript project and I’m starting to turn. Dynamic typing may really not be that hard if you keep your modules isolated well. At least, I feel like it hasn’t been impeding me while building a GUI app - which is nice :)
(If anyone’s interested, I wrote about all of this in more detail here.)
Lua has the usual C-style syntax for calling functions:
some_function("abc")
but the grammar also allows you to call a function with a literal string or a literal table. This example is the same as the one above:
some_function "abc"
The string literal is most often used in conjunction with the require function (which is also a really cool design that enables ML-like modules, but that’s another rabbit hole I could go into…)
local mod = require "mod"
As for literal tables, the syntax looks like this:
which is where you can do some real magic. Combined with Lua tables being both hash tables and arrays in one, this can lead to some extremely expressive code that resembles XML in structure, but unlike XML, is:
extremely easy to query and traverse, because attributes live under string keys, and children live under numeric keys
really ergonomic, because you don’t need the <tag></tag> syntax
actually Turing-complete! which lets you build simpler things like config formats with variables, but also things like UI DSLs akin to JSX - except without the hideous syntactical hack of pasting XML into JS. It’s all just Lua all the way down!
Balatro is another recent game written in Lua. It clocks in at about 30,000 lines of code (counted with
wc -l
) not including the localization files. Despite being a commercial game the source code with comments is readable by simply unzipping the Steam distribution. It’s written using the Love2D engine and runs great on many systems. Also a fantastic game.I learned some Lua, gosh, 15+ years ago to write World of Warcraft addons. It made a lot of sense to me as a lightweight scripting language you’d embed into a bigger compiled game. One thing I like about Lua is that the language is so simple there’s generally only one way to do something which frees you from worrying about “am I using this language right?”
Haha; just a while ago I wrote a post about five different ways to do methods in Lua. (well, technically Fennel but they’re the same runtime and all of it applies equally well to Lua)
https://technomancy.us/197
But the fact that you don’t have to use classes is very much appreciated, as is the flexibility to make trade-offs between reloadability and inspectability/serializability where needed. And granted some of the approaches in my post only make sense in very specific circumstances. (Erlang-style solutions for Erlang-type applications like my IRC server.)
Apparently, Hades also has a lot of the game logic in Lua and it is accessible with the original comments in the released game.
Balatro is an immaculate piece of game design. It’s put a real damper on my hobby projects too…
I’m not entirely convinced that Balatro’s incredible series of if-else chains is the one way to write card effect logic in Lua.
lol the code feels awfully brute force but then again pretty straightforward to read.
I haven’t seen Balatro but I’ve definitely seen VVVVVV, and it’s a beauty as well: https://github.com/TerryCavanagh/VVVVVV/blob/2.2/desktop_version/src/Game.cpp#L612
This giant frickin’
switch
is what drives most of the game logic. I love it. Sometimes the best games have just the worst code by “cleanness” standards.Lack of classes is one of my favorite features of Lua.
I don’t think I’ll ever understand the way that programmers coming from C and C-like languages expect data to be copied implicitly. It’s so utterly foreign to me. If you want a copy, make a copy!
But I understand the way that unlearning bad habits from previous languages can be a large part of the work in learning a new language.
I’ve never heard an Ex Cathedra argument applied to programming language design before!
It takes a while to get used to the way modules work in Lua, but once I learned them, they quickly became one of my favorite language features. However, for a 60kloc codebase I think you still need self-enforced discipline and structure if you rely on modules alone.
I have had great luck using Lua’s (very unique) support for first-class environments as a way to isolate different parts of the codebase from each other. If you break your code apart into multiple environments, you can strictly define the smallest set of functions which are necessary to bridge them. This means that instead of (in my case) one 12kloc game, I had four separate 3kloc codebases to deal with, and I never felt the unstructured flexibility of Lua to be burdensome at that scale. I haven’t seen very many people discuss this as a strategy for dealing with the lack of a type system, but I wrote some about it here: https://technomancy.us/181
This is coming from a rustacean, but I feel like a scripting language with first-class reference types might be very interesting and possibly extremely useful. That way you always know whether something is going to be copied or referenced. Go does this with its pointers, but also makes the Billion Dollar Mistake; any experienced Go people out there with opinions on this?
Can you elaborate on what you mean by that? All the scripting languages I know have references for their data structures; you always know something is going to be referenced because that’s the only reasonable way to pass arguments to a function. If you want a copy, you can do that by … making a copy.
Are you saying you should be able to have a data structure and attach behavior to it saying “always copy this implicitly every time it’s passed to a function”? That seems like it would be confusing and error-prone, but maybe I’m misunderstanding you.
If you want it to be easier to make a copy, I think it should be done using a syntactic shorthand at the call site rather than being an inherent property of the data structure.
It isn’t though! There’s plenty of time that (small, static) structures are passed by value. Essentially all the time in immutable languages for example (I’ve been messing with Elixir a lot recently), but also in Rust and C. What if everything was pass-by-value by default instead of being by-value for some types and by-reference for others? Basically my goal is to able to explicitly have unique pointer semantics of some kind or another for a scripting language. This isn’t a terribly well-thought-out idea, but maybe we start with all values being immutable, and then have something like:
OCaml does something kinda like this, but it’s kinda squirrelly, so I don’t know that this would really be better or more convenient – OCaml is already kinda squirrelly with mutation anyway. I suppose my goal is to make a scripting language that starts out with “immutable everywhere”, ‘cause if you have that then you really don’t care what is a value and what is a reference, and then allow you to relax that in certain places so you can get the best of both worlds. I’m also imagining maybe a
ref
type that can be shared and abox
type that is mutable but not share-able. Or maybe you start with move semantics no matter what; I don’t know.Basically Rust’s data model changed my world and I think there’s a lot of room for languages with the most world-changing parts of its data model but a lot more relaxation around Rust’s theme of “never, ever allocate dynamic memory or touch a refcount unless you’re told to”.
Well sure, but in the case of immutable data that’s an optimization that should not meaningfully affect the semantics of the language. If the runtime or compiler wants to make copies behind my back because it’ll be faster, it’s free to do so as long as I won’t notice.
For mutable data it’s just weird; why would I want the language to make a copy for me without asking, but then for the copy to be mutable too? Every case I can think of where I would in theory want something like that would be better off served by explicitly creating an immutable copy instead, which the language would then be free to optimize however it likes. (I definitely think languages should make this easier, but still never implicit.)
I think the level of confusion that C programmers exhibit when learning languages that don’t have the flaw of implicit argument copying is pretty good evidence that it was a language design mistake we should be leaving in the past. Say what you mean!
Edit: I re-read your example and it sounds like you’re talking about something different since your example doesn’t contain any function calls except
print
. I’m talking about the implicit copying that happens to arguments in C when a function is called. I don’t see anything objectionable in your example, except I think it’s a bad idea to use the same notation to bind a local variable as you do for mutating a field of a data structure. OCaml has the right idea where it uses=
for locals and<-
for field mutation.Boggle, where did you get this idea from?
Probably not so much for C, but C++ objects can have copy constructors and so the implicit behaviour of either copying a reference, shallow copy, or deep copy, is a per-object choice. In C, struct assignment is a shallow copy but you have an explicit differentiation between a struct variable and a pointer to a struct. In Lua (or most other Smalltalk-family languages) there are no value types.
Objective-C is close to Lua/Smalltalk here, though it still uses C structs for value types. Things like NSRange, NSPoint, and so on are all value types that do not contain pointers and so can be safely shallow copied. You can return one from a function and it will not alias anything else. In Lua, you must be careful to create a new object (table) and return it, rather than returning an existing one. In Objective-C, Smalltalk-like objects are referenced only via pointer and so you must explicitly copy them (some classes know that they’re immutable and so will silently tweak refcount rather than copying).
Lua strings and numbers are value types.
Are they? I thought that they were simply immutable objects, like their Smalltalk equivalents. That’s certainly how they behave.
At the abstract machine level, you have a reference to an immutable string. When you append to a Lua string, you get a reference to a new string that is the result of the append operation. The old string may no longer be reachable and so is collected. Similarly, the number 4 is an immutable object. If you add 1 to 4, then you get a reference to the immutable object that represents the number 5. As an implementation detail, the fact that it’s a reference to the global immutable number 5 is encoded in the word that holds the reference and the underlying object doesn’t actually exist, but at the abstract machine level it’s just a reference to an immutable object.
If I write the following:
The first line is creating a variable
x
and setting it to a reference to the global immutable object that is a number with a value 1. The second line is calling its + method, which returns a reference to the global immutable number 2. When I print it, I have a reference to an object of type number. As an implementation detail, the reference is NaN boxed and holds the value of the number and the + operation is just implemented as arithmetic on that address.If I print the type of this number and another object:
I get:
Note that neither of these say reference to table or reference to number, because the fact that it’s a reference is implicit.
Lua’s number is similar to Smalltalk’s SmallInt, which similarly embeds the value in the pointer, but remains an object and has no methods that can mutate the instances.
In Objective-C, we do similar tricks to embed small
Number
andNSString
instances in pointers. At the abstract machine level, you still have a pointer to an immutableNSNumber
orNSString
subclass, but as an implementation detail we are embedding the value in the pointer. Because the underlying type is immutable, this is a valid transform (indeed, if you do pointer comparison, it will appear as if they are pointers to the same object because they are pointer-sized integers that refer to the same abstract immutable object by encoding the same bit pattern in the pointer).How do you tell the difference between a value type and an immutable object?
Value types have mutation operations. I cannot mutate a Lua string or integer, I can just use operations that create new ones. C structs and C# value types support taking reference to them and mutating them, but default to treating them as values.
You can always use value types as an implementation detail to implement something that looks like an immutable value in the type system, but the converse is not true.
Huh, that’s not a definition I have seen before.
The distinction between values and objects that I have usually seen is that objects have a distinct identity and values don’t. For example, value types in Java (project Valhalla) are immutable and have no identity.
Sorry, I should have said ‘Value types can have mutation operations’. Not having mutation operations is not part of their definition, whereas it is for immutable objects. The two concepts are orthogonal.
It looks like I was wrong about references to value types in C#. There’s a fairly rich set of things in C# that blur these lines in interesting ways but not in quite the way I thought.
That’s a reasonable definition, but immutable objects blur that line a lot. Immutable objects may be interned, unless their identity is observable and the language makes guarantees that prevent it. Python does this for small numbers (as did Cocoa before stuffing them in object pointers) and Java can do this for strings (explicitly for dynamic strings, implicitly for literals), but they still have a unique identity, it’s just that you can create another object of the same identity if it happens to have the same value. The same applies to any language that does string interning.
Java made the mistake of requiring the boxed number objects to be unique as a result of having the language guarantee that
new T()
returned an object that compared not-identical-to every other object. This ensured thatnew Integer(1) != new Integer(1)
was a language-level guarantee, which is surprising to a lot of programmers. Newer versions of the language worked around this by providing mechanisms for constructingInteger
instances that were permitted to alias existing ones and used these for auto-boxing. This permitted implementations to embed anyInteger
instance constructed this way in the pointer (or keep a small table of common integers), while keeping ones that are created withnew
as real heap allocations.Defining identity in a language is hard. In Verona, I want to enable the freeze operation (which converts a region into an immutable object graph) to perform interning, but not require it. If object identity is observable, that’s mostly fine, but if object identity is expressible as a number (e.g. an address) then it’s hard because you can build trees using that number or hash tables using that number as the input to a hash function (which may be simple truncation) and then interning breaks your data structures. In C/C++, this kind of pattern in common and leads to some efficient code. In Java or .NET, the equivalent is a non-unique hash code for each object that is typically its address at the first time the method is called, which then sets a flag and is captured if the object is relocated. This is fairly expensive to maintain and is not unique (even for live objects) and so can’t be used to key identity-based data structures. I don’t have a good solution here.
According to wikipedia: https://en.wikipedia.org/wiki/Value_type_and_reference_type
This implies that value types may have mutation operations, whereas immutable objects do not (by definition).
If you have a reference to a value, if that value is mutated, you can observe the mutation through the reference. With an immutable object, you can’t change what
1
is.I think the take-away is that immutable objects are one kind of values, but not all values are immutable objects (squares and rectangles).
Well, it depends on how you learn the language and how much they know C/C-likes. In every tutorial I’ve read back then when learning dynamically typed languages, the wording “variables references data” is quite explicit: it’s a pointer, copying the pointer won’t copy the pointed data. So I never had that surprise.
Good Python tutorials even mention this aspect of dynamically typed language when diving into keyword arguments:
And they explain that the default value of the keyword argument is evaluated when the function is created, and all you have then is a reference to the same object for each call.
I honestly think that in order to be surprised by the “it’s all references” semantics, you must dive in the language without even reading a single tutorial about it (fun fact, that’s how I learned Erlang, by reading the source code of RabbitMQ, at that time I thought atoms were some form of global variables defined elsewhere and the naming was explicit enough so that I did not need to even know that datatype, silly me).
Or if you read documentation that wasn’t written by a C programmer and thus the author didn’t even bother to mention things that they thought were self-evident. I went over a decade into my career without learning how weird C is about passing arguments to functions. It never would have occurred to me to explain this in documentation I wrote, because the idea of copies being made implicitly felt nearly incomprehensibly bizarre.
I was just reading a paper about this a while back; confusion about aliasing and copying with data structures is actually quite common among learners: https://cs.brown.edu/~sk/Publications/Papers/Published/lk-smol-tutor/
One major conclusion from the paper:
When writing a tutorial for a newbie, you should not assume that anything is self evident, nothing is.
There is nothing weird about it, it’s just different semantics. “copy and explicit reference” is common in system languages, and in functional programming. “reference and explicit copy” is common in dynamically typed languages.
“Numbers are in base 10 by default. You can use a text editor to write your programs. Whitespace characters are used to separate tokens and are not considered part of identifiers. Calling a non-recursive function once executes the body of the function exactly once. When put a number in a field of a data structure, the language won’t change that number implicitly. The compiler typically expects programs to be stored in files on disk. An
if
will only execute one of its branches, not both. The compiler will not delete files it is given as input. Creating a new data structure requires memory.” I could go on and on.Every one of these facts seems as obvious to me as “calling a function does not result in data structures being copied implicitly because calling a function has nothing to do with copying data” seemed ten years ago. Listing out everything you assume to be self-evident is not feasible, and even if it were, no one would read a tutorial like that.
There are environments/tools/languages where this is not the case.
Again, this is not self-evident, considering there exist something called “visual scripting”.
There are languages where identifiers can have whitespaces, so once again, not self evident.
I’ve taught many programming courses, and I can assure you I got this very question. Some students would assume that code is executed as soon as it is written, so putting a function in a module would execute it every time you import the module. Not everyone is able to grasp immediately the concepts you manipulate daily for the last 20 years.
I’ve also got this question, students used to IDEs not even understanding what is a compiler, how does it work, or how to use the terminal. Basically they wrote Java classes in their IDE and clicked a button, they had no idea it was stored as files, and that there was a compiler being called. The IDE completely abstracted this part away, which confused them.
I also got this confusion from some students.
Oh I’m sure you can find other examples you are used to. But you seem to forgot that someone new to IT does not have the understanding, experience, mental model that you have. At the beginning, students struggle with control-flow, memory management (even when it’s managed by the compiler or runtime), etc…
So I stand by my claim, to be surprised by a language semantics, one must not have read said semantics. After decades of learning many programming languages, picking one up is faster, and I often tend to go look for the reference/specification, which must be, guess what: exhaustive and explicit.
Take a look at this section of the Rust reference: https://doc.rust-lang.org/reference/lexical-structure.html
Now, to conclude, I totally believe anyone who will say “I don’t write good documentation”, that is indeed an unfortunate but common trend in IT ;)
I agree the reference should be exhaustive and explicit.
However, your original claim was about tutorials, not reference documentation. I have never seen a tutorial that reads the way you describe.
That little bit of Python gets me every time because Ruby doesn’t cache default values. :)
I don’t really understand this. Lua is almost identical to Smalltalk when you ignore the syntax and Smalltalk classes map directly to and from Lua metatables. You can take a Lua program and write the same structure in Smalltalk using classes where the original used metatables or vice versa.
Is it really just the fact that it doesn’t use the word ‘class’ that you like? I found it quite jarring that Lua adopts Smalltalk semantics for almost everything but its own terminology everywhere.
No, it’s that I don’t want inheritance to be coupled with encapsulation and polymorphism and other unrelated concepts.
I would rather have them available a la carte so I can use the techniques that are relevant for the problem at hand. Nearly every module in my program needs encapsulation. One or two might need polymorphism. It’s extraordinarily rare that inheritance is useful, but when it does happen, it’s always (IME) in a way that has nothing to do with classes. (the most recent example I can think of is key maps in a text editor; lua mode inherits from programming-mode which inherits from text-mode)
To quote Joe Armstrong: “You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.”
Edit: plus I have enough experience with Ruby to know that if classes are part of the language, then you will see a lot of programmers using them for things they’re not good at. You do actually see this in Lua anyway since of course you can build classes in userspace, but it’s mostly limited to applications and not libraries.
This is a good read. Not religious. Not defensive. It reminds me of so many projects I’ve been on over the years. Pragmatic, here’s where we are, here’s where we’d maybe rather be but reality took us a different route.
I’ve never built anything in Lua (but I’ve hit it many times in its many various incarnations in lots of different systems), and it’s kind of cool to think that someone built a rather large system relying on it to this degree. I’ve built large systems in BASIC derivatives (dynamic type systems and all) in the distant past, and managed large Python projects in the less distant past, and it’s all doable – but as this article explains, it’s a lot easier once you’ve gotten in tune with the language itself. In the case of Lua, it sounds like understanding the table-centric approach is incredibly powerful in this regard.
EDIT: Oh, and if the author or anyone who knows the game is reading this: What’s the 30 second pitch on the game? If I enjoy Factorio, and I going to love this?
I’ve been using Lua for years at this point (so much that it’s the language that taught me programming!) and after many ups and downs it’s one of my favorites when it comes to design.
I’ve no darn clue how the designers managed to show so much restraint as to have one data structure which can do everything you’d ever ask for, and more (tables are really nice for building DSLs, because they have a little bit of syntax sugar -
function { table = "argument" }
- that goes a long way.)The one thing that always hurts is the dynamic typing system, which I always felt made it hard to write software you can rely on… but maybe that was just my lack of skill. Because I’ve been recently working on a pure JavaScript project and I’m starting to turn. Dynamic typing may really not be that hard if you keep your modules isolated well. At least, I feel like it hasn’t been impeding me while building a GUI app - which is nice :)
(If anyone’s interested, I wrote about all of this in more detail here.)
Your web site is hilarious and I recommend everyone read it.
Can you go into the DSL aspect rmore?
Lua has the usual C-style syntax for calling functions:
but the grammar also allows you to call a function with a literal string or a literal table. This example is the same as the one above:
The string literal is most often used in conjunction with the
require
function (which is also a really cool design that enables ML-like modules, but that’s another rabbit hole I could go into…)As for literal tables, the syntax looks like this:
which is where you can do some real magic. Combined with Lua tables being both hash tables and arrays in one, this can lead to some extremely expressive code that resembles XML in structure, but unlike XML, is:
<tag></tag>
syntaxHere’s an example of what you could build with this sugar, as presented on my website:
I’d be really happy to use something like this instead of JSX ^^
The most beautiful part of this is that it only took them two extra syntax rules to make this possible. Like really, all they had to do is turn this:
into this:
(from https://lua.org/manual/5.4/manual.html#9)
Isn’t that just so pragmatically awesome.
See this: https://premake.github.io/docs/Your-First-Script
It’s entirely valid Lua code, which is just calling functions, but it looks like some custom programming language.
The semantics of Lua make it easy to extend and make it look like something else.