Qubyte Codes

Marqdown

Published

Last updated

ふりがな:

Markdown is the standard for writing in techie circles these days, but it's pretty minimal. For a readme it's all you need, but if you create a site around Markdown like I have then you pretty quickly bump into its limitations. Markdown is deliberately limited, so it's no fault of the language or its creator!

Nevertheless, over time I've added my own tweaks and extensions upon Markdown, so I've decided to document them, and name the dialect Marqdown. Naming may seem a little arrogant, but it's mostly to disambiguate what I'm writing with more common Markdown variants.

My variant is based on the default configuration provided by marked, with additions layered on top. This is mostly the original flavour of Markdown with a few deviations to fix broken behaviour. As I add features I'll document them in this post.

Footnotes

I use footnotes[1] from time to time. The way I've implemented them makes the superscript a link to the footnote text, and the footnote text itself has a backlink to the superscript, so you can jump back to where you were.

The footnote in the previous paragraph is encoded like this:

I use footnotes[^][sparingly] from time to time.

This was an interesting feature to implement because it produces content out of the regular flow of the document. The markdown engine had to be abused a bit to create the superscript links first and keep a list of their footnote texts. Once the document is rendered, a post-render routine checks for any footnote texts, and when there's at least one it appends a section with an ordered list of footnotes. Another complication is index pages. For the blog posts index page only the first paragraph of each post is used, and footnote superscripts have to be removed from those.

Languages

HTML supports language attributes. Most of the time a (well-built) page will have a single language attribute on the opening <html> tag itself, setting the language for the entire document.

I write notes in mixed English and Japanese as I learn the latter. When working with CJK text it's particularly important to give elements containing such text the appropriate language tag so that Chinese characters are properly rendered (there are divergences which are important).

I wrote a Markdown syntax extension to add these tags. Since my documents are mostly in English, this remains as the language attribute of each page as a whole. For snippets of Japanese I use the syntax extension, which looks like:

The text between here {ja:今日は} and here is in Japanese.

This snippet renders to:

<p>
  The text between here <span lang="ja">今日は</span> and here is in Japanese.
</p>

Simple enough. The span is unavoidable because there is only text within it and text surrounding it. Where the renderer gets smart is in eliminating the span! If the span is the only child of its parent, the renderer eliminates the span by moving the language attribute to the parent. For example:

- English
- {ja:日本語}
- English

migrates the language attribute to the parent <li> to eliminate a span:

<ul>
  <li>English</li>
  <li lang="ja">日本語</li>
  <li>English</li>
</ul>

Similarly, the renderer is smart enough to see when the span has only one child, and can move the language attribute to the child to eliminate the span. Example:

{ja:_すごい_}

migrates the language attribute to the spans only child, an <em>:

<em lang="ja">すごい</em>

This becomes particularly important in the case of my notes, where it's common to nest ruby elements inside these language wrappers. There's a ruby annotation in the next section, and you'll see the language attributes appear directly on the ruby element if you inspect it.

As with footnotes, the language attribute migration and span elimination is handled using JSDOM after a markdown document is rendered as part of a post-render routine. In the future I may look into adapting marked to render directly to JSDOM rather than to a string.

Ruby annotations

I'm studying Japanese. It's pretty common to see annotations to help with the pronunciation of words containing Chinese characters. This could be because the text is intended for learners like me, but it's also common to see it for less common words, or where the reading of a word may be ambiguous.

These annotations typically look like kana rendered above or below the word (when Japanese is written left-to-right), or to one side (when Japanese is written from top to bottom). Ruby annotations are not unique to Japanese, but in the Japanese context they're called 振(ふ)り仮名(がな) (furigana), and you can see them right here in the Japanese text in this sentence! The code for it looks like this:

^振,ふ,り,,仮名,がな^

The delimiters are the carets, odd elements are regular text, and even elements are their annotations. So, ふ goes above 振, nothing goes above り (it's already a kana character), and がな goes above 仮名.

There are actually specific elements for handling ruby annotations, so what you see rendered is only from HTML and CSS! They're pretty fiddly to work with manually though, so this extension saves me a lot of time and saves me from lots of broken markup.

Highlighted text

The syntax of this extension is borrowed from elsewhere (I didn't invent it). This addition allows me to wrap stuff in <mark> elements. By default, this is a bit like using a highlighter pen on text. The syntax looks like:

Boring ==important== boring again.

which renders to:

<p>
  Boring <mark>important</mark> boring again.
</p>

which looks like:

Boring important boring again.

This is another extension I use heavily in my language notes to emphasize the important parts of grammar notes.

Fancy maths

Now and then I do a post with some equations in. I could render these elsewhere, but I like to keep everything together for source control. Add to that, I want to render the maths statically to avoid client side rendering (and all the JS I'd have to include to do that).

I settled on another common markdown extension to do this, which is to embed LaTeX code. The previous extensions are all inline, whereas maths comes both in inline and block contexts. I use temml to render LaTeX directly to presentational MathML. Inline mode uses single $ symbols as delimiters, and blocks use double $$ on their own lines above and below the content.

In the past I used MathJax. Visually the SVG MathJax produces is superior to MathML, but the latter is now acceptable, and <math> elements are more appropriate than <svg>, and much less bulky.

This:

$$
x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$

Results in:

x=−b±b2−4ac2ax = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
  1. sparinglyꜛ

Backlinks


Feel like sharing or responding?

Comments or corrections? Toot to me!

This blog supports webmentions. If your blog supports webmentions and you've linked to this post, then a mention will have been dispatched to me. These are viewed and published manually, so please allow some time. If you would like to manually send me a webmention, please submit the URL to it here:

Edit page.