π

Org Mode Syntax Is One of the Most Reasonable Markup Languages to Use for Text

Show Sidebar

Disclaimer: this is a very nerdy blog entry. It is about lightweight markup languages and why I think that Org mode syntax is the best lightweight markup language for many use-cases. And with lightweight markup language, I do mean the syntax, the way you express headings, lists, font variations such as bold face or italic, and such things.

Please do note that this is not about Emacs at all. This is about Org mode syntax and its advantages even when used outside of Emacs. You can type Org mode syntax in vim, notepad.exe, Atom, Notepad++, and all other text editors out there. And in my opinion it does have advantages compared to the other, common lightweight markup standards such as Markdown, AsciiDoc, Wikitext or reStructuredText.

Of course, Org mode syntax is my favorite syntax. Despite my personal choice you will see that I've got some pretty convincing arguments that underline my statement as well. So this is not just a matter of personal taste.

If you already have a grin on your face because you don't have any clue what this is all about: keep on reading. It makes an excellent example for making fun of nerds at your next dinner party. ;-)

Org Mode Syntax Is Intuitive, Easy to Learn and Remember

Here you are. This is almost anything you need to know about Org mode syntax:

 * This Is A Heading
 ** This Is A Sub-Heading
 *** And A Sub-Sub-Heading

 Paragraphs are separated by at least one empty line.

 *bold* /italic/ _underlined_ +strikethrough+ =monospaced=
 [[http://Karl-Voit.at][Link description]]
 http://Karl-Voit.at → link without description

 - list item
 - another item
   - sub-item
     1. also enumerated
     2. if you like
 - [ ] yet to be done
 - [X] item which is done

 : Simple pre-formatted text such as for source code.
 : This also respects the line breaks. *bold* is not bold here.	  

And yes, for the people with advanced syntax in mind, this is not everything. Just the basics. A common objection is that source code needs a begin and an end marker instead of a string prefix like : and Org mode syntax is providing this as well:

 #+BEGIN_SRC python
 myresult = 42 * 23
 print('Hello Europe! ' + str(myresult))
 #+END_SRC	  

I've seen many coworkers who typed Org mode markup when taking notes in their text editor. And they did not even know anything about it. So it is that intuitive I'd say.

While I was learning Org mode, I did not even use a cheat-sheet for the syntax as I normally do. It was very natural for me to type Org mode syntax right from the start.

Tables are a bit more complicated like in all other markup languages I know of:

 | My Column 1 | My Column 2 | Last Column |
 |-------------+-------------+-------------|
 |          42 | foo         | bar         |
 |          23 | baz         | abcdefg     |
 |-------------+-------------+-------------|
 |          65 |             |             |	  

You most probably won't type a table like this outside of Emacs. The manual alignment without tool-support is very tedious. But even here you are able to deliver a perfectly fine Org mode table by simply ignoring the alignment altogether:

 | My Column 1|My Column 2 | Last Column |
 |-
 | 42 | foo | bar|
 | 23 | baz | abcdefg|
 |-
 | 65 |||	  

Org Mode Syntax Is Standardized

This is an almost ridiculous argument because in my opinion a markup is of no use when it is not the same for tool A as for tool B.

However, there are markup languages that are different. For example the very widely used markup language named Markdown has many flavors to choose from:

Pandoc lists six different Markdown flavors as output formats. This is an absolutely bad situation which foils the original idea behind lightweight markup languages. When some web service tells me that I can use "Markdown" for a text field, I have to dig deeper to find out which of those many different Markdown standards the web page is talking about. After this I will have to continue and look for a cheat-sheet of this dialect because nothing is more difficult to differentiate than multiple standards that are almost the same but not really the same. A usability hell. I get furious every time I have to enter this hell.

With Markdown, the original (inconsistently designed) syntax format is a minimal set that made much sense when used when typing emails and so forth. Later-on, various tools needed more syntax elements for obvious reasons. For example, tools wanted to have footnotes or tables. Those syntax add-ons extended the original Markdown. However, they were not standardized.

This resulted in a zoo of very similar but incompatible set of Markdown flavors. Those differences cause information loss by moving from one "Markdown" tool to another "Markdown" tool because Markdown is not Markdown in most cases.

In contrast to that, the syntax of Emacs org-mode as the one and only original form has - by far - the largest set of syntax elements and various extensions by modules. Any other adaptation of this syntax in other tools chose a real sub-set of the original syntax elements. This makes data transitions much smoother, is less error-prone and causes less data loss.

An additional notion (which is not related to the syntax itself) is the lack of standardization of file extensions using Markdown syntax. I've seen .md, .mkdn, .markdown and even .txt (in Markdown).

With Org mode syntax, life is easy. The snippet from the previous section explains all there is. Any tool that interprets Org mode syntax accepts this simple and easy to remember syntax. The extension is always .org.

Org Mode Syntax Is Consistent

Different Heading Approaches

Many lightweight markup languages do offer multiple ways of typing headings.

There are basically three ways of defining headings:

  1. Prefix headings
  2. Pre- and postfix headings
  3. Underlined headings

Here are some examples for each category:

 Prefix headings:

 # Heading 1
 ## Heading 2
 ### Heading 3

 Pre- and postfix headings:

 = Heading 1 =
 == Heading 2 ==
 === Heading 3 ===

 Underlined headings:

 Heading 1
 =========

 Heading 2
 ~~~~~~~~~

 Heading 3
 *********	  

I prefer the prefix heading style. Org mode syntax use this as well with * as prefix characters. The more asterisks, the deeper the level of the heading is.

Pre- and postfix headings do offer bad usability. The user has manually synchronize the number of prefix character with the number of postfix characters. And it is totally unclear how something like = heading == with different numbers of pre/postfix characters is going to turn out when being interpreted.

And in case the user already used a markup language with simple prefix headings, it is not logical why there is the need for the postfix characters at all.

Even worse than this is the underlined heading category. The user is completely irritated for multiple reasons. Besides the tedious manual work to align the stupid heading characters with the heading title, it is not clear what characters must be used for those heading lines. If you've got a bigger document with different levels of headings you get confused which heading character stands for which heading level.

Are the tilde characters level one? Or was it the equals characters? And how about asterisks? Without a cheat-sheet, the occasional markup user is completely lost.

This gets even more worse: some markup languages let you choose your "order" of heading characters. This results in weird situations. For example one author is starting to write a reStructuredText document using her favorite heading syntax. A second author is joining in and has to analyze the document in order to know what heading syntax he must use.

reStructuredText

In the reStructuredText mode of Emacs you can find following function:

You can visualize the hierarchy of the section adornments in the current buffer by invoking rst-display-adornments-hierarchy, bound on C-c C-a C-d. A temporary buffer will appear with fake section titles rendered in the style of the current document. This can be useful when editing other people's documents to find out which section adornments correspond to which levels.

Yes, you got it right, it is true: this function's only purpose is to generate a dummy-hierarchy of headings to visualize which markup has to be used for heading 1, which one for heading 2 and so forth just for this single document. What a bad design decision of the markup when you need such hacks just to know how a heading should look like in a markup even if you are familiar with in the first place.

Here is one more: some markup languages even allow mixed heading styles. You can use an underlined heading style for heading level 1, a prefix style for level 2, another underlining style for level 3 and so forth. Now the chaos is a perfect one.

AsciiDoc

I'm using this cheatsheet as reference for this section in AsciiDoc.

 Level 1
 -------
 Text.

 Level 2
 ~~~~~~~
 Text.

 Level 3
 ^^^^^^^
 Text.

 Level 4
 +++++++
 Text.	  

There is no good way of memorizing the characters used for the various levels of headings. Within the text, you can't see clearly which heading has a higher level.

Do you have to manually align the characters below the heading text? Does it work when the heading text has a different length than the following line?

 == Level 1
 Text.

 === Level 2
 Text.

 ==== Level 3
 Text.

 ===== Level 4
 Text.	  

Why are there two different options for the same very basic syntax element? What about mixing them? Does it work?

Why don't they start with one eual sign? Level one consists of two equal characters and so forth. Why this off-by-one mismatch?

 * bullet
 * bullet
   - bullet
   - bullet
 * bullet
 ** bullet
 ** bullet
 *** bullet
 *** bullet
 **** bullet	  

and

 - bullet
   * bullet	  

Again: why two different ways of writing a simple list?

First list version: Why do I have to use multiple asterisks but just one dash?

 a. letter
 b. letter
    .. letter2
    .. letter2
        .  number
        .  number
            1. number2
            2. number2
            3. number2
            4. number2
        .  number
    .. letter2
 c. letter	  

One letter, two dots, one dot → any logic seems to be gone.

 Term 1::
     Definition 1
 Term 2::
     Definition 2
     Term 2.1;;
         Definition 2.1
     Term 2.2;;
         Definition 2.2
 Term 3::
     Definition 3
 Term 4:: Definition 4
 Term 4.1::: Definition 4.1
 Term 4.2::: Definition 4.2
 Term 4.2.1:::: Definition 4.2.1
 Term 4.2.2:::: Definition 4.2.2
 Term 4.3::: Definition 4.3
 Term 5:: Definition 5	  

Definition indented or not? The level after the definition text is hard to follow in the layout.

 .Multiline cells, row/col span
 |====
 |Date |Duration |Avg HR |Notes

 |22-Aug-08 .2+^.^|10:24 | 157 |
 Worked out MSHR (max sustainable
 heart rate) by going hard
 for this interval.

 |22-Aug-08 | 152 |
 Back-to-back with previous interval.

 |24-Aug-08 3+^|none

 |====	  

Please note that this has many table options that Org mode syntax does not provide such as multi-line cells. However, this comes with the price of a table syntax I would not be able to remember at least for occasional use.

Web Links and Simple Markup in Different Markup Languages

Let's have a look at a different markup element: external links. As you already remember in Org mode syntax, a link looks like this:

 [[http://Karl-Voit.at][my home page]]	  

The only difficult thing here is to remember that the URL is at the beginning and the description follows after the URL. Many markup languages do add additional and unnecessary levels of difficulties.

Here are some examples from Wikipedia and comments by me where a user might be irritated.

AsciiDoc:

 http://example.com[Text]	  

The form is simple but for complex URLs, the [Text] might look like being part of the URL itself. Not beautiful but at least something I could live with.

Me writing Markdown. Image with two buttons "[text](link)" and "(text)[link]" and somebody who is swating.
Tweet by jitbit with a meme that emphasizes the hard to remember link syntax.

Markdown:

 [Text](http://example.com)
 [Text](http://example.com "Title")	  

Brackets or parentheses first? Why using different kind of markup characters in the first place like only brackets? Is the Title part of the URL? Why not part of Text? Very confusing design decisions from my point of view.

#Pandoc 3 introduced #Markdown support for #wikilinks
Mastodon-announcement of pandoc to support wikilinks in Markdown but faced non-standardized syntax, trying to detect all possible variations. (Source: https://fosstodon.org/@pandoc/109814498586039387)

reStructuredText:

 `Text <http://example.com/>`_	  

Holy moly. This is some weird stuff. First, you have to grave accents ` and not apostrophes '. Then what about the underscore character at the end? This is as complicated as you can define a simple URL. I'd even prefer the hard to type HTML version of linking. A disaster for something which has "lightweight" in its class name.

Even with the most basic formatting syntax, Markdown (I assume in all flavors of it) shows a high level of inconsistency:

 _italic_, **bold**, `monospace`, ~~strikethrough~~	  

Why is it that italic and monospace require only one character before and after the string in-between and others require two? It's inconsistent and therefore hard to learn and error-prone.

Markdown has even different formatting for the very same thing:

 This is **bold text**.
 This is __bold text__.
 Text**is**bold
 NOT allowed: Text__is__bold	  
 This is *italic*.
 This is _italic_.
 Text*is*italic
 NOT allowed: Text_is_italic	  
 Horizontal Rules with three or more asterisks (***),
 dashes (---), or underscores (___) on a line by themselves	  

And now compare the Org syntax for the same formats:

 /italic/, *bold*, ~monospace~, +strikethrough+
 three or more dashes on one line for horizontal rules	  

Org made different choices for the syntax. In my opinion with better mnemonics but you could disagree here. However, with one leading and one trailing syntax character, it's for sure easier to learn, type and recognize due to better consistency.

Org Mode Syntax Can Be Easily Typed

The simple syntax of Org mode does not imply typing unnecessary characters. You don't have to manually align something like underlined headings. Anybody using a simple text editor is very fast at adding markup for headings, font variations, and so forth. The previous section proved that other markup languages clearly fail in many cases.

Org Mode Syntax Makes Sense Outside of Emacs

You don't have to use the Emacs editor to write and work with Org mode markup text. As I mentioned above, many people already do so just because Org mode syntax is an intuitive and clean way of typing text characters.

When you've got text information in Org mode markup, you can process it with many tools. Most prominent and most important examples are files pushed within a GitHub repository and the swiss army knife named Pandoc which is able to convert Org mode syntax to dozens of formats like HTML, odt (LibreOffice), docx (Word), LaTeX, PDF, and so forth.

Many lightweight markups don't come with any decent tool support at all. When I'm using, e.g., markdown in any solution that doesn't do more than syntax highlighting, there is no real reason why this can not be written in Org mode syntax instead.

So, yes, Org mode syntax came with a perfect tool support in the first place. But my point is, that the syntax itself has that many advantages that is should be adapted for all the "tool-support is not the main focus"-applications as well.

Screenshot from Mastodon
My statement on Mastodon on the usefulness of the Org-mode syntax as syntax language.

You can write and render Org mode syntax in GitHub and GitLab. You can and should use it in any text editor like Notepad for your personal notes. Especially when there is no tool support because of the "easy to type manually"-syntax.

Org Mode Syntax Has Excellent Tool Support (If You Want)

As I mentioned in the beginning, this is not an article about Emacs. Nevertheless for anybody not familiar with Emacs I have to mention that with Emacs there is a tool that supports (not only) in writing Org mode syntax in a perfect way.

You might start with mouse-only usage. There are menu items with all important functions. For the users that want to get a minimum of efficiency, the menu items show you the keyboard shortcuts you might want to use.

For Org mode syntax it is really easy to learn. Basically you just have to use TAB for toggle the collapsing and expanding of headings, lists, and blocks. It's Alt and the arrow keys to move around headings, list items, and even table columns/rows. Ctrl-Return creates a new heading or list item without the need of entering the markup characters and manually matching indentation levels at all.

That's it. With those three things you're good to write Org mode syntax efficiently. The basic file open/save, finding help, exiting Emacs stuff is accessible with icons or the menu. No need to learn more keyboard shortcuts if you don't want to.

Having experienced this great tool-support, users typically are eager to learn more. You don't have to. You might be happy with Org mode for capturing minutes of meetings and your shopping list. However, others do master a few additional things and write whole eBooks within Org mode.

Since this article is not about Emacs, I want to explain why I mention the perfect Emacs support. My point is that Org mode deserves to be adopted for all kind of use-cases where a simple to read and simple to type lightweight markup is needed. However, if you want to have a decent tool support, there is no lightweight markup out there which has a better tool support compared to Org mode within Emacs. I did not find any tool support for Markdown, AsciiDoc, Wikitext or reStructuredText anywhere that could compete with the cozy Org mode syntax support within Emacs.

Yes, there could be more tool support outside of emacs but this is my point: change this. This syntax should support more people.

Summary

Lightweight markup languages are designed to be used with a minimum effort compared to full-blown and therefore more complicated markup languages such as HTML or LaTeX.

Some are doing their job better than others. In my experience, many design decisions of widely adapted markups such as Markdown, AsciiDoc or reStructuredText (and others) are questionable from a usability point of view. At least I do have some issues when I have to use them in my daily life.

Unfortunately, I hardly see any people out there using Org mode syntax as a markup language outside of Emacs although there are very good reasons for it as an easy to learn and easy to use markup language.

With this blog article I wanted to point out the usefulness of Org mode even when you are not using Emacs as an writing tool. Of course, when you are using Emacs to type Org mode syntax, you get probably the most advanced tool-support for typing lightweight markup on top. But this was not the point of this article.

Update 2021-11-28: Due to the high demand and discussions of this article, the idea grew in me that we do have to address some issues of Org-mode mentioned here. Therefore, I came up with Orgdown. You should check out my Orgdown motivation article.

Backlinks:

Org-mode markup deserves to be adopted beyond Emacs
Tweet by UnixToolTip with a link to this article.

Backlinks

Comments

"revocation" has a valid point related to the missing standardization of Org mode. Here is my comment on this:

The statements here refer to a lightweight markup, the basic things of Org mode syntax. I explicitly listed "headings, lists, font variations such as bold face or italic, and such things".

What I do not cover here is a full syntax statement or standard. In my opinion, currently this is not possible outside of Emacs for various reasons.

Of course, there are variations in interpreting Org mode files between Emacs and pandoc. Also, pandoc only supports a sub-set of Org mode. Otherwise, pandoc would have to re-implement or embed Emacs for parsing purposes.

In this specific case, pandoc seems to have a more strict parser related to leading spaces for #-lines, or keywords. I'm pretty sure that the pandoc project accepts this issue as a bug. In doubt, the interpretation of Emacs is the definition, or golden-standard, of Org mode syntax. Even this beta-version of a syntax definition does not mention optional spaces before keywords. The manual mentions org-element-parse-buffer and org-lint which would be most probably the best choice for defining the official standard if you would search for one.

However, this does not relate at all with the intention of this article: the design of the (basic) Org mode syntax compared to other lightweight markup languages. All the issues mentioned where other markups show inconsistencies and usability issues where Org mode seems to have advantages still do apply here. Completely independent of the standardization argument. My personal believe is, that if there would be more use of Org mode syntax elements outside of Emacs, there would be a much higher pressure on formally defining Org mode as a syntax which pandoc and even Emacs could use as the golden standard.

So far, there is not even the necessity of defining this golden standard because nobody outside of the Emacs community knows or even is using Org mode. And this is what I tried to change a bit because other markup languages do tend to hurt my geeky soul when I do have to use them. ;-)

Comment by Ian Zimmerman

Ian wrote an email comment on reStructuredText (ReST):

Most of your argument makes at least some sense, but you miss the point of the ReST link syntax: the URL doesn't have to be inline! The normal style (which I use for my blog) is like this:
blah blah blah follow the interesting link at `foo`_ blah blah blah
more blah bluh bleh
.. _foo: http ://www.foo.example.com
which among other things means that you can mention and link foo multiple times without repeating the URL.
Maybe there is a way of doing that in org - I don't know, I'm a total beginner. That's why I'm reading your post.

(Note: I added an additional space character after http in the example above to work around this lazyblorg issue.)

First, about that question related to defining an URL once and referring to it multiple times. I'm not aware of an Org mode feature that is providing this functionality except footnotes, which is not specific to URLs only. An even more general approach could be the use of very powerful macro replacement. However, discussion those things, we're far away from the discussion of markup that is lightweight.

Second, I acknowledge that I did not cover this possibility to define URL references in REsT markup. In my opinion and when I try to look at it from a hypothetic user perspective, I do think that this still proves my point. There is no logical explanation for that underscore character. I already had to add those back-ticks as syntax element. Why combining those back-ticks with an additional underscore? This does not make sense to me. Furthermore, where should I place this block where the URLs are defined? After each paragraph? Visual clutter. At the end of the text? Too fragile IMHO. And don't get me started on those two dots and the underscore switching places. This is certainly not putting more weight in the thought that ReST syntax has a higher level of logic.

For somebody looking for a smooth, easy to remember lightweight markup language, I still think that nothing beats Org mode in general. Even when we totally neglect the Emacs integration and 99 percent of its feature-set which allows for the most complex requirements while still being consistent and lightweight in syntax.


Related articles that link to this one:

Comment via email (persistent) or via Disqus (ephemeral) comments below: