the HTML tokenizer won’t know to skip Markdown code blocks and code spans.
You’d insert the markdown into the HTML the same way you’d put pure text in: by escaping it before stitching it with other HTML, using a DOM model to create a text node for it, or using some construct like <![[CDATA ]]>
(and taking care to escape ]]>
within that).
Raw HTML pass-through is not my favorite feature of Markdown, but it is a
feature, and since our aim was to give a spec for Markdown, not invent
something completely different, we have to deal with it.
Right. For CommonMark to be what it is trying to be—a spec that clarifies but doesn’t redefine markdown—HTML passthru must be there.
HTML pass-through is the only way to create e.g. definition lists in CML
If people have developed a common way of expressing terms and definitions in plain text, the result, if not wrapped in the appropriate semantic HTML, would at least be readable by humans. And a future markdown or commonmark could start recognizing such a syntax at some point. I see that there are other discussions in this forum hoping for commonmark to define a syntax for table creation. Yes, eliminating the ability to use the features of HTML which do not yet have an analogue in user-friendly commonmark syntax would be an issue. But if the user can still convey the data in a readable way in pure text form without the use of the HTML feature, it might be acceptable to people who do not want to expose content authors to HTML. And I think tables are a bigger deal than definition lists because the most natural way to display tabular data when authoring plain text would probably assume fixed width fonts which would mean abusing code blocks…
Currently I am using the following for my particular use case:
public string ToHtml(string terms) => CommonMarkConverter.Convert(
// Disable HTML passthru because I don’t like it. But then I
// manually have to restore to get things like “<http://blah.org>”
// to work. Of course this would break input like “`<http://`”.
Regex.Replace(
terms.Replace("<", "<"),
"<(https?:)",
"<$1"));
I have seen some other discussion in this forum of trying to modularize commonmark so that features can be more easily mixed and matched. I think HTML passthru would be a great candidate for being optional. Website template authors might like the magical behavior of <div>
followed by a blank line allowing them to interlace rich HTML and quickly-written markdown text. Users contributing content to a website appreciate that their naturally written plaintext magically gets a facelift when *However*
renders as “However” while these same users would find it confusing if <div
mysteriously disappears along with the remainder of the line).
So, would a “user-friendly” variant of commonmark be something that commonmark should define, or should that be left up to individual website authors?