-
Notifications
You must be signed in to change notification settings - Fork 84
fix: handle consecutive backslashes in no-reference-like-urls
#523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle consecutive backslashes in no-reference-like-urls
#523
Conversation
|
Hi @lumirlumir!, thanks for the Pull Request The pull request title isn't properly formatted. We ask that you update the pull request title to match this format, as we use it to generate changelogs and automate releases.
To Fix: You can fix this problem by clicking 'Edit' next to the pull request title at the top of this page. Read more about contributing to ESLint here |
no-reference-like-urls
| /** | ||
| * @import { SourceRange } from "@eslint/core" | ||
| * @import { Heading, Node, Paragraph, TableCell } from "mdast"; | ||
| * @import { Heading, Paragraph, TableCell } from "mdast"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Node type was not being used, so I removed it.
| /** Pattern to match both inline links: `[text](url)` and images: ``, with optional title */ | ||
| const linkOrImagePattern = | ||
| /(?<!(?<!\\)\\)(?<imageBang>!)?\[(?<label>(?:\\.|[^()\\]|\([\s\S]*\))*?)\]\((?<destination>[ \t]*(?:\r\n?|\n)?(?<![ \t])[ \t]*(?:<[^>]*>|[^ \t()]+))(?:[ \t]*(?:\r\n?|\n)?(?<![ \t])[ \t]*(?<title>"[^"]*"|'[^']*'|\([^)]*\)))?[ \t]*(?:\r\n?|\n)?(?<![ \t])[ \t]*\)(?!\()/gu; | ||
| /(?<=(?<!\\)(?:\\{2})*)(?<imageBang>!)?\[(?<label>(?:\\.|[^()\\]|\([\s\S]*\))*?)\]\((?<destination>[ \t]*(?:\r\n?|\n)?(?<![ \t])[ \t]*(?:<[^>]*>|[^ \t()]+))(?:[ \t]*(?:\r\n?|\n)?(?<![ \t])[ \t]*(?<title>"[^"]*"|'[^']*'|\([^)]*\)))?[ \t]*(?:\r\n?|\n)?(?<![ \t])[ \t]*\)(?!\()/gu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this rule parses paragraphs and other nodes to search for links and images instead of checking link and image nodes directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, there were some tricky edge cases that prevented us from directly visiting link and image nodes.
One edge case was:
markdown/tests/rules/no-reference-like-urls.test.js
Lines 82 to 91 in f80a9e1
| dedent` | |
| [link](<uri>) | |
| [uri]: https://example.com/uri | |
| `, | |
| dedent` | |
|  | |
| [uri]: https://example.com/uri | |
| `, |
markdown/tests/rules/no-reference-like-urls.test.js
Lines 556 to 577 in f80a9e1
| { | |
| code: dedent` | |
|  | |
| [<venus>]: https://example.com/venus.jpg | |
| `, | |
| output: dedent` | |
| ![Venus][<venus>] | |
| [<venus>]: https://example.com/venus.jpg | |
| `, | |
| errors: [ | |
| { | |
| messageId: "referenceLikeUrl", | |
| data: { type: "image", prefix: "!" }, | |
| line: 1, | |
| column: 1, | |
| endLine: 1, | |
| endColumn: 18, | |
| }, | |
| ], | |
| }, |
The link and image nodes can have links wrapped in brackets <>, but the AST just represents them as links without the brackets.
It wasn’t easy to handle this case by directly using the link and image nodes.
There’s also an edge case with auto-link literals in CommonMark and GFM mode:
markdown/tests/rules/no-reference-like-urls.test.js
Lines 92 to 99 in f80a9e1
| { | |
| code: dedent` | |
| <http://foo> | |
| [http://foo]: hi | |
| `, | |
| language: "markdown/gfm", | |
| }, |
markdown/tests/rules/no-reference-like-urls.test.js
Lines 100 to 107 in f80a9e1
| { | |
| code: dedent` | |
| http://foo | |
| [http://foo]: hi | |
| `, | |
| language: "markdown/gfm", | |
| }, |
These cases shouldn’t be reported either, but if we look at the link node directly, it can be pretty tricky to handle all of the edge cases correctly.
It would be nice to refactor this code to visit link and image nodes directly, but in my experience, it hasn’t been easy to implement.
So, I've thought the current approach is reasonable, especially since no-reversed-media-syntax and other rules for detecting error-prone syntax use similar strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we then visit link and image nodes directly and use the regex on their text to check if they're using the syntax this rule wants to check? I think it would be less error-prone then parsing whole raw paragraphs.
So, I've thought the current approach is reasonable, especially since
no-reversed-media-syntaxand other rules for detecting error-prone syntax use similar strategies.
I think no-reversed-media-syntax is a different case, because text with reversed syntax is not parsed as a link. Here, in no-reference-like-urls, we could use the fact that text we want to check is already parsed as link/image by the parser, if that's always the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your suggestion could help make this rule more reliable 👍
I agree with your idea, but the scope seems a bit large for this particular fix. It will also take me some time to figure out the best approach, so if it’s alright with you, I’d like to create a separate PR to refactor the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then let's fix the problem this way. I could try to simplify rule to visit link and image nodes directly after this is merged.
mdjermanovic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Prerequisites checklist
What is the purpose of this pull request?
Which language are you using?
CommonMark and GFM.
What did you do?
I've tried using consecutive backslashes in the
no-reference-like-urlsrule, but it reported false positives as shown below:What did you expect to happen?
First Problem
The code below doesn't report, which is the correct behavior.
However, the code below does report, which are false positives:
Second Problem
The code below is reported as a
linktype, which is the correct behavior.However, The code below is reported as an
imagetype, which are incorrect behaviors.The code below should be reported as a
linktype, since the!is escaped with a backslash.Link to minimal reproducible Example
Adding the following test cases would help identify the issues:
(Test cases not mentioned here but added were included just to verify the behavior.)
valid:invalidWhat changes did you make? (Give an overview)
This PR is a follow-up to #490.
In this PR, I've addressed false positives in
no-reference-like-urlsrule that were caused by consecutive backslash handling.Previously, we used the following regex pattern to avoid matching backslash escapes:
(?<!(?<!\\)\\)However, this pattern is incorrect because they don't properly handle consecutive backslash escapes metioned above.
So, I've updated them to use the following regex correctly:
(?<=(?<!\\)(?:\\{2})*)Related Issues
Ref: #490
Is there anything you'd like reviewers to focus on?
In JavaScript,
\\\\\\\\(8 backslashes) is interpreted as\\\\(4 backslashes) in Markdown, and is ultimately recognized as\\(2 backslashes). In this case, an even number of backslashes is not treated as an escape character and is recognized as a normal backslash.