3bmd
2024-10-12
markdown processor in CL using esrap parser.
Common Lisp Markdown -> html converter, using esrap for parsing, and grammar based on peg-markdown.
Currently a bit slow and uses lots of RAM for large documents (particularly when using the top-level doc
parser instead of reading documents as a sequence of block
s), but seems to handle the tests from peg-markdown reasonably well.
Note that this library processes Markdown and not the newer (and better specified) CommonMark, so may not behave quite as expected for people used to the latter. See issue #53 for some discussion on why CommonMark support isn't currently planned, and would probably be in a separate library if it were implemented.
todo:
- clean up API
- figure out how to automate testing (closure-html +
tree-equal
? need some way to normalize whitespace though), and add tests - optimize grammar
- optimize esrap
Extensions:
-
If
3bmd:*smart-quotes*
is non-NIL
while parsing, some extra patterns will be recognized and converted as follows (outside code blocks):'
single quoted strings'
->‘
...’
like ‘single quoted string’ (with slightly ugly heuristics to avoid contractions)- other single quotes
'
->'
' "
double quoted strings"
->“
...”
, like “double quoted string”- ellipsis
...
or. . .
->…
, … - en dash
--
->–
, – - em dash
---
->—
, — - left right arrow
<->
->↔
, ↔ - left arrow
<-
->←
, ← - right arrow
->
->→
, → - left right double arrow
<=>
->&hArr
, ⇔ - left double arrow
<=
->⇐
, ⇐ - right double arrow
=>
->⇒
, ⇒
-
Loading
3bmd-ext-wiki-links.asd
adds support for parsing simple[[]]
style wiki links: If3bmd-wiki:*wiki-links*
is non-NIL
while parsing, wiki links of the form[[foo]]
or[[foo|...]]
will be parsed, where...
is one or more optional args separated by|
characters. By default, wiki links will just print thefoo
part as normal text. To integrate into an actual wiki, users should bind3bmd-wiki:*wiki-processor*
during printing, and define a method on3bmd-wiki:process-wiki-link
that specializes on the value of3bmd-wiki:*wiki-processor*
to create an HTML link from thefoo
and arguments. (API subject to change.) -
Loading
3bmd-ext-code-blocks.asd
adds support for github style fenced code blocks, withcolorize
support: If3bmd-code-blocks:*code-blocks*
is non-NIL
while parsing, in addition to normal indented verbatim blocks,```
can be used to delimit blocks of code:``` This block doesn't specify a language for colorization ```
or
```lisp ;;; this block will be colorized as Common Lisp (defun foo (bar) (list bar)) ```
Language names ignore case and whitespace, so
Common Lisp
andcommonlisp
are treated the same, see3bmd-code-blocks:*colorize-name-map*
for full list of supported language names, or add names to that to recognize a custom colorizecoloring-type
. If a language name is not specified after the opening```
,3bmd-code-blocks:*code-blocks-default-colorize*
can be set to one of the keywords naming acoloring-type
recognized bycolorize
to specify a default, otherwise the block will not be colorized.Can optionally use
Pygments
instead ofcolorize
by setting3bmd-code-blocks:*renderer*
to:pygments
. Lexer and formatter options (-O
) can be specified like```c++|linenos=1
.Some attempt has been made to avoid interpretation of the options by the shell when calling
pygmentize
, but you should probably audit the code and test the interaction with the implementation ofuiop:run-program
on your implementation of choice before using it on untrusted input. Pygments html formatter creates arbitrary files when passed-Ofull,cssfile=filename
, so parameters with the substringcssfile
are ignored (noclobber_cssfile=True
is also set by default, but that only prevents overwriting, not creation). Users with untrusted input may want to audit that as well to make sure there are no other dangerous options or ways to get around the exact substring check.Can optionally use
Chroma
instead ofcolorize
orPygments
by setting3bmd-code-blocks:*renderer*
to:chroma
. Change the embedded theme of the:chroma
code block via3bmd-code-blocks:*chroma-style*
. The various styles for Chroma can be viewed viachroma --list
.If no highlighting is desired, in case of using a JavaScript highlighter, it is possible to specify
:nohighlight
as3bmd-code-blocks:*renderer*
. In this case thepre
tagsclass
attribute is rendered with the defined language. So:```lisp (defun foo ())
Is rendered as:
<pre class="lisp"><code>...</code></pre>
To change the format used for rendering the
class
attribute value you can set a different format to*code-blocks-pre-class-format*
which defaults to~a
in order to render the language as parsed from the triple ticks block. I.e.: setting the format(setf 3bmd-code-blocks:*code-blocks-pre-class-format* "brush: ~a;")
will render:<pre class="brush: lisp;"><code>...</code></pre>
Inline code spans (like
`this`
) can also be optionally highlighted with3bmd-ext-code-blocks
. Set3bmd-code-blocks:*render-code-spans*
to true, and set3bmd-code-blocks:*render-code-spans-lang*
to the desired language. -
Loading
3bmd-ext-definition-lists.asd
adds support for parsing PHP Markdown Extra style definition lists: If3bmd-definition-lists:*definition-lists*
is non-NIL
while parsing, the following definition list will be recognized (see http://michelf.ca/projects/php-markdown/extra/#def-list):Term : definition
-
Loading
3bmd-ext-tables.asd
adds support for parsing PHP Markdown Extra style tables: If3bmd-tables:*tables*
is non-NIL
while parsing, the following will be recognized as tables (see http://michelf.ca/projects/php-markdown/extra/#table):| Content Cell | Content Cell | | Content Cell | Content Cell | | First Header | Second Header | | ------------- | ------------- | | Content Cell | Content Cell | | Content Cell | Content Cell | | Name | Description | | ------------- | ----------- | | Help | Display the help window.| | Close | Closes a window | | Left-Aligned | Center Aligned | Right Aligned | | :------------ |:---------------:| -----:| | col 3 is | some wordy text | $1600 | | col 2 is | centered | $12 | | zebra stripes | are neat | $1 |
The following simplified table style is not supported, because it is ambiguous,
especially, without heading:
```
First Header | Second Header
------------- | -------------
Content Cell | Content Cell
Content Cell | Content Cell
```
-
Loading
3bmd-youtube.asd
adds support for embedding youtube videos. If3bmd-youtube:*youtube-embeds*
is non-NIL
while parsing, the shorthand syntax!yt[video-id(|options)]
can be used. For example:!yt[nbY-meOL57I] !yt[nbY-meOL57I|width=20,allowfullscreen]"
-
Loading
3bmd-ext-math.asd
adds support for math markup with libraries like MathJax. If3bmd-math:*math*
is non-NIL
while parsing, the shorthand syntax$$ latex markup $$
can be used. For example:$$ \frac{\partial E}{\partial y} = \frac{\partial }{\partial y} \frac{1}{n}\sum_{i=1}^{n} (y_i - a_i)^2 $$