I remember non-American programmers, in the days before widespread Unicode adoption, describing how they had to type (and read) Å as introducing a block, or £include to include a header file. In that situation at least, digraphs were an improvement.
It’s very funny to me that digraphs were the second attempt at working around the lack of a few ASCII codepoints in some writing systems. The first attempts, trigraphs, were truly awful and are now deprecated / being removed.
What??! They removed trigraphs??! What nonsense esoterica can we ask in coding interview instead??!
A shout-out to autoconf‘s @<:@ and @:>@quadrigraphs for generating [ and ] in the configure script [ and ] quote arguments to macros and aren’t easily escaped otherwise. You often want to render default values inside square brackets.
Ah, the digraphs you sometimes see mentioned in older C books (never encountered them in actual code, ’though I do remember a good deal of K&R style code bases).
It’s interesting how early languages interacted with the available key codes. The most obvious being systems where you only had upper case. The Algol family with its begin and end didn’t have the brace issue, but if I remember correctly, some early examples used the left-pointing arrow (←) for assignment instead of “:=” and couldn’t use underscore because that’s how early ASCII worked (which means that viewed only a few years later, some people probably wondered why an underscore is used for assignment).
(Not that begin/end was that easy with Algol’s “stropping”, but that’s an entirely different tale)
Not that we don’t still have this issue today. We could use all kinds of more semantically meaningful Unicode characters, but a lack of an easy way to enter them prevents us from doing it. And not just in programming languages, the lack of easily accessible dashes and proper apostrophes and quotation marks also leads to less ideal substitutes being way too common (I’ve yet to see a proper DIN 2137 E1 keyboard here in Germany).
It’s not keycodes or codepoints but keys and keyboards.
Early typewriters absolutely had a _: just take a close look at the remington 2 if you don’t believe me - you would use this with overstrike to underline some text on a page.
Early teletypes on the other hand, didn’t make as much use of the overstrike feature, so they often replaced the underscore with a left arrow but this wasn’t universal: plenty of typewriters of this era had underscores.
But neither teletypes or typewriters had so many brackets and braces, and whilst modern keyboards in the US and UK often have them, here in Portugal we definitely need keys for º and ª and ç and lots of other é ñ and so on, so my macbook pro’s keyboard doesn’t have [or ] keys. I think the mac german layout is similar but maybe you don’t have a ç key. In any event I’m lucky I learned C on a teletype because I still don’t have those keys!
You wouldn’t know it looking at my code because I’ve got this in my vimrc:
imap <% {
imap %> }
imap <: [
imap :> ]
APL programmers use a lot of symbols too, including ← for assignment, but I have to press so many keys to get it on the mac with the default layout that I just don’t bother on my laptop.
It absolutely was character sets. See page 21 of the rationale. A lot of the motivation was due to the way ISO 646 international ASCII used the higher punctuation character codes for extra letters, so {} could not be represented at all.
Incorrect: That document you referred isn’t “the rationale” for digraphs (or possibly anything), and that’s not the only mistake in it.
The digraphs were introduced in ISO/IEC 9899-1990-AMD1 (1995) and it makes it very clear it’s about internationalisation.
From the very first page, I quote:
Use of these features can help promote international portability of C programs. … Subclauses 6.1.5 and 6.1.6 of ISO/IEC 9899: 1990 are adjusted to include the following six additional tokens. In all aspects of the language, these six tokens
<: :> <% %> %: %:%:
behave, respectively, the same as these existing six tokens
It’s usually faster and easier to press two keys on opposite sides of the keyboard than to chord two keys (or three) on the same side of the keyboard; The option key is also in an awkward place requiring more hand travel.
The most influential language that used ← for assignment was SmallTalk — which was weird because as a 1970s language it came after ASCII replaced ← with _. Dunno why Xerox PARC used out-of-date ASCII.
Anyway, SmallTalk was a wordy language with a preference for longer identifiers than had been typical in older languages. Because they lacked _ they used camelCase to separate words.
Object-oriented programmers carried the SmallTalk camelCase identifier style to later languages that had _ and could have supported less typographically ugly styles.
I think at that time ASCII wasn’t as standardized as one would assume. Stanford had their own version, and I think this was also used by the Algol-60 variant Knuth used to write the first TeX version.
Every Smalltalk I’ve used has used := for assignment and ^ for return, but rendered them as ← and ꜛ. I didn’t know that they were characters you could type on the Alto!
My old tweet with this fun fact was surprisingly popular :-) It was prompted by Allen Wirfs-Brock saying, “Very few realize how much the software development world has been influenced over the last 20 years by the infiltration of former Smalltalk programmers into key communities.”
I remember non-American programmers, in the days before widespread Unicode adoption, describing how they had to type (and read)
Å
as introducing a block, or£include
to include a header file. In that situation at least, digraphs were an improvement.It’s very funny to me that digraphs were the second attempt at working around the lack of a few ASCII codepoints in some writing systems. The first attempts, trigraphs, were truly awful and are now deprecated / being removed.
What??! They removed trigraphs??! What nonsense esoterica can we ask in coding interview instead??!
A shout-out to
autoconf
‘s@<:@
and@:>@
quadrigraphs for generating[
and]
in theconfigure
script[
and]
quote arguments to macros and aren’t easily escaped otherwise. You often want to render default values inside square brackets.I had no idea C was supporting this out of the box.
“an alternate C syntax” led me to believe it would be a post with more substance than this :/
Ah, the digraphs you sometimes see mentioned in older C books (never encountered them in actual code, ’though I do remember a good deal of K&R style code bases).
It’s interesting how early languages interacted with the available key codes. The most obvious being systems where you only had upper case. The Algol family with its begin and end didn’t have the brace issue, but if I remember correctly, some early examples used the left-pointing arrow (←) for assignment instead of “:=” and couldn’t use underscore because that’s how early ASCII worked (which means that viewed only a few years later, some people probably wondered why an underscore is used for assignment).
(Not that begin/end was that easy with Algol’s “stropping”, but that’s an entirely different tale)
Not that we don’t still have this issue today. We could use all kinds of more semantically meaningful Unicode characters, but a lack of an easy way to enter them prevents us from doing it. And not just in programming languages, the lack of easily accessible dashes and proper apostrophes and quotation marks also leads to less ideal substitutes being way too common (I’ve yet to see a proper DIN 2137 E1 keyboard here in Germany).
It’s not keycodes or codepoints but keys and keyboards.
Early typewriters absolutely had a
_
: just take a close look at the remington 2 if you don’t believe me - you would use this with overstrike to underline some text on a page.Early teletypes on the other hand, didn’t make as much use of the overstrike feature, so they often replaced the underscore with a left arrow but this wasn’t universal: plenty of typewriters of this era had underscores.
But neither teletypes or typewriters had so many brackets and braces, and whilst modern keyboards in the US and UK often have them, here in Portugal we definitely need keys for º and ª and ç and lots of other é ñ and so on, so my macbook pro’s keyboard doesn’t have
[
or]
keys. I think the mac german layout is similar but maybe you don’t have a ç key. In any event I’m lucky I learned C on a teletype because I still don’t have those keys!You wouldn’t know it looking at my code because I’ve got this in my vimrc:
APL programmers use a lot of symbols too, including ← for assignment, but I have to press so many keys to get it on the mac with the default layout that I just don’t bother on my laptop.
It absolutely was character sets. See page 21 of the rationale. A lot of the motivation was due to the way ISO 646 international ASCII used the higher punctuation character codes for extra letters, so {} could not be represented at all.
Incorrect: That document you referred isn’t “the rationale” for digraphs (or possibly anything), and that’s not the only mistake in it.
The digraphs were introduced in ISO/IEC 9899-1990-AMD1 (1995) and it makes it very clear it’s about internationalisation.
From the very first page, I quote:
I don’t see where the contradiction is?
Internationalization -> national variants of ISO646 without {} -> trigraphs, but trigraphs are super ugly -> digraphs
The issue of keyboards is tied up with internationalization but it wasn’t the prime reason for trigraphs or digraphs.
Is there any specific reason you prefer to use the digraphs instead of option key to access the brackets?
It’s usually faster and easier to press two keys on opposite sides of the keyboard than to chord two keys (or three) on the same side of the keyboard; The option key is also in an awkward place requiring more hand travel.
The most influential language that used ← for assignment was SmallTalk — which was weird because as a 1970s language it came after ASCII replaced ← with
_
. Dunno why Xerox PARC used out-of-date ASCII.Anyway, SmallTalk was a wordy language with a preference for longer identifiers than had been typical in older languages. Because they lacked
_
they used camelCase to separate words.Object-oriented programmers carried the SmallTalk camelCase identifier style to later languages that had
_
and could have supported less typographically ugly styles.I think at that time ASCII wasn’t as standardized as one would assume. Stanford had their own version, and I think this was also used by the Algol-60 variant Knuth used to write the first TeX version.
Every Smalltalk I’ve used has used := for assignment and ^ for return, but rendered them as ← and ꜛ. I didn’t know that they were characters you could type on the Alto!
My old tweet with this fun fact was surprisingly popular :-) It was prompted by Allen Wirfs-Brock saying, “Very few realize how much the software development world has been influenced over the last 20 years by the infiltration of former Smalltalk programmers into key communities.”