Add lexer for MediaWiki Wikitext#2373
Conversation
jeanas
left a comment
There was a problem hiding this comment.
I've reviewed this until "wikilinks" only for now.
In general it looks good, congrats on the extensive tests. Please fix the cases of catastrophic backtracking, though.
It is actually a "transclusion modifier", but it should be OK to treat it as a parser function.
|
@jeanas Could you please give this another review? I have updated it according to the review, and my concerns were left above. |
jeanas
left a comment
There was a problem hiding this comment.
Generally LGTM, but I really suggest to make separate states (see my comment about redirects) for very long regexes such as the ones for pre/nowiki/math/chem/etc. It's pretty exhausting to review long regexes for catastrophic backtracking (we don't have tooling for it yet), and using an extra state makes it immediately clear that the backtracking doesn't occur (plus it's generally more readable IMHO).
This workaround produces slightly different tokens without any visible difference. I have manully reviewed them to ensure I didn't introduce any bugs.
|
@jeanas Done. Could you please check whether the catastrophic backtracking issue is solved? |
jeanas
left a comment
There was a problem hiding this comment.
Thanks, this looks much better. I agree it's a bit sad to make the lexer a little less correct, but it's quite annoying if invalid constructs (while editing, for example) cause the engine / editor / whatever is using Pygments to hang. Hopefully we'll switch from re to regex, which has ++ and the like to avoid this.
|
Uh-oh, we have a problem in the CI. I have to see if it's possible to make regexlint ignore these regexes. |
I have no idea why it is complaining about (?s) usage, perhaps it's bugged, but I still fixed it by using [\s\S].
I forgot about this. It still has some usage.
|
@jeanas The linter problem should be addressed now. |
|
Ah, thanks, I feared it would be too verbose, but looks fine. |
|
Thank you for your work! |
Hello! Can you tell me approximately when the next release (with this lexer) is planned? :) |
|
As soon as I get to it. With a bit of luck, this weekend or next weekend. We're in reasonably good shape. |
|
Super! I'm very glad to hear it :) Thank you! |
This PR implements #827.
Because Wikitext is a very extendable and inconsistent language, it is practically impossible to lex without given the configuration of a MediaWiki installation. This lexer will only catch common syntaxes and tags provided by common extensions.