Proper regex when nesting elements? #3416

ivanjaros · 2024-08-17T10:35:02Z

ivanjaros
Aug 17, 2024

I have read that when one has a block tag that uses something like ``` or :::, one should simply append more characters to the outer elements.

so for example:

:::foo
normal element
:::

::::foo
:::foo
nested element
:::
::::

but i am having a bit trouble with matching the tags with regex in my extensions.

As far as i know, the start should simply be something like start(src) { return src.match(/^:::foo\n/)?.index; },.This only tells marked that this extension might be interested in processing the src.

Later, in the tokenizer, I would have something like:

tokenizer(src, tokens) {
    const rule = /^:::foo\n([\s\S]*?)\n:::/;
    const match = rule.exec(src);
    if (match) {
      const token = {
        type: 'foo',
        raw: match[0],
        tokens: []
      };
      this.lexer.blockTokens(match[1], token.tokens);
      return token;
    }
  },

Normally I would end the rule with \n to signal end of the block, but then I would be unable to match the :::: version because I would be looking at \n:::\n. But still, if I have nested elements, this is not working properly. So my question is how should these regexes look like to properly match the closing "tags" when nesting?

Maybe one thing to note is that in case of blockquote, the pattern is actually:

> quote
> > nested
> > quote

and it will look like this:

quote

nested
quote

PS: I can add :{3,} to beginning and end of the patterns but that does not help because it will match opening tag of the parent but it will also match opening tag of the child as closing tag. So that is not working either.

Answered by ivanjaros

Aug 17, 2024

I think i figured it out. Not sure if this is the "right" way to do it but it is working:

const tabs = {
  name: 'tabs',
  level: 'block',
  start(src) { return src.match(/^:{3,}tabs\n/)?.index; },
  tokenizer(src, tokens) {
    let count = 0
    // because this is a block token, we will actually receive "\n" as first character.
    for (let i = 0; i < src.length; i++) {
      if (src.charAt(i) === ':') {
        count++
        continue;
      }
      if (count > 0) {
        break
      }
    }

    if (count === 0) {
      return
    }

    const pattern = `^:{${count}}tabs\\n([\\s\\S]*?)\\n:{${count}}`;
    const rule = new RegExp(pattern)
    const match = rule.exec(src);

    if (match

View full answer

ivanjaros · 2024-08-17T11:38:09Z

ivanjaros
Aug 17, 2024
Author

I think i figured it out. Not sure if this is the "right" way to do it but it is working:

const tabs = {
  name: 'tabs',
  level: 'block',
  start(src) { return src.match(/^:{3,}tabs\n/)?.index; },
  tokenizer(src, tokens) {
    let count = 0
    // because this is a block token, we will actually receive "\n" as first character.
    for (let i = 0; i < src.length; i++) {
      if (src.charAt(i) === ':') {
        count++
        continue;
      }
      if (count > 0) {
        break
      }
    }

    if (count === 0) {
      return
    }

    const pattern = `^:{${count}}tabs\\n([\\s\\S]*?)\\n:{${count}}`;
    const rule = new RegExp(pattern)
    const match = rule.exec(src);

    if (match) {
      const token = {
        type: 'tabs',
        raw: match[0],
        tokens: []
      };
      this.lexer.blockTokens(match[1], token.tokens);
      token.tokens = token.tokens.filter(t => t.type === 'tab')
      return token;
    }
  },
};

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proper regex when nesting elements? #3416

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Proper regex when nesting elements? #3416

Uh oh!

Uh oh!

ivanjaros Aug 17, 2024

Replies: 1 comment

Uh oh!

ivanjaros Aug 17, 2024 Author

ivanjaros
Aug 17, 2024

ivanjaros
Aug 17, 2024
Author