Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bump Oniguruma-To-ES dep to support more grammars and simplify #836

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

slevithan
Copy link
Contributor

@slevithan slevithan commented Nov 16, 2024

Description

  • Shiki 1.23.0
    • 178 supported, 18 mismatched, and 18 unsupported.
    • Forgiving: 183 supported, 31 mismatched, 0 unsupported.
  • With this PR 🎉
    • 192 supported, 10 mismatched, 13 unsupported.
    • Forgiving: 197 supported, 18 mismatched, 0 unsupported.

(The forgiving supported count is more accurate, since it's copying the Oniguruma engine's behavior of silently failing for regexes with invalid Oniguruma syntax.)

Changes that enabled this improvement are described in the Oniguruma-To-ES release notes for versions 0.3.0 and 0.4.0.

Is this still experimental? Probably. But I thought I'd raise the question since, although it can continue to improve in various ways (supporting a few more grammars, improving performance, etc.), IMO the foundations for a strong JS engine are now in place. The strongest claim for it still being experimental is that we use 'loose' accuracy instead of 'default' (allowing some inaccurate \G handling). But unless you plan to not use 'loose' (and drop the count of supported grammars), there will always be some inaccurate use of \G with some target strings.

Additional context

Feel free to undo or improve any of my documentation edits!

I removed the target: 'auto' handling because it's now built into Oniguruma-To-ES, and it auto-selects ES2025 for Node.js 23 and other compatible environments.

I also removed the simulation option because its last remaining functionality was removed. I was wrong earlier when I said I might need to bring some of its former handling back.

More context on simulation

The (^|\\\uFFFF) to (^|\\G) substitution it was doing was no longer having any effect on the number of supported grammars, but it actually was lowering the diff count somewhat for mismatched grammars. However, that slight improvement wasn't happening for good, predictable, or easily understandable reasons, which is why I'm removing it. Oniguruma-To-ES, in 'loose' accuracy mode, handles many \G anchors accurately but plays fast and loose with some others (like oniguruma-to-js did before it). As a result of this not-always-accurate handling, the effects of tweaking various things related to \G is unpredictable in whether it will improve or hurt the diff count. E.g., not applying Oniguruma-To-ES's advanced subclass-based \G handling improves diff count a little when using loose accuracy (very unexpected), but it dramatically hurts results if using default or strict accuracy. And adding additional pre-substitutions for additional ways that \G is used in patterns, like swapping (?!\\\uFFFF) with (?!\\G), hurts the diff count. So, it's only this specific case of (^|\\\uFFFF) that happens to slightly improve the numbers on-balance for non-obvious reasons related to which specific patterns are made sticky, etc.

However, I think we want to not focus too much on hyper-optimizing the diff count for specific samples with grammars that are mismatched anyway, and instead make things as predictable and easy to reason about as possible, especially now that the numbers are very solid without hacks like this.

What's actually going on with \\\uFFFF?

After investigating more what's going on in vscode-textmate, it's replacing \A and \G anchors with \<literal \uFFFF> when it wants \A and \G anchors to fail to match (for a couple different reasons), so it's correct to respect this and NOT bring some of these \Gs back. The vscode-textmate authors should have used something like (?!) to be more explicit and not have edge cases where it inappropriately does match certain strings, but no matter. In general, all of the various magical pattern substitutions in vscode-textmate (for \A, \G, \z, and backreferences) are implemented in hacky and flawed ways, e.g. not correctly accounting for things like escaped backslashes, enclosed numbered backrefs, etc. 😓

Copy link

netlify bot commented Nov 16, 2024

Deploy Preview for shiki-matsu ready!

Name Link
🔨 Latest commit d884cea
🔍 Latest deploy log https://app.netlify.com/sites/shiki-matsu/deploys/6739abd43e38eb00083a254e
😎 Deploy Preview https://deploy-preview-836--shiki-matsu.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Nov 16, 2024

Deploy Preview for shiki-next ready!

Name Link
🔨 Latest commit d884cea
🔍 Latest deploy log https://app.netlify.com/sites/shiki-next/deploys/6739abd4b31ac60008118c3f
😎 Deploy Preview https://deploy-preview-836--shiki-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@slevithan
Copy link
Contributor Author

Bumped oniguruma-to-es from v0.4.0 to v0.4.1 to work around a parser bug in Bun < v1.1.35.

@antfu antfu merged commit 4a9cd8a into shikijs:main Nov 18, 2024
11 of 12 checks passed
@slevithan slevithan deleted the js-engine-next branch November 22, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants