feat: bump Oniguruma-To-ES dep to support more grammars and simplify #836
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
(The forgiving supported count is more accurate, since it's copying the Oniguruma engine's behavior of silently failing for regexes with invalid Oniguruma syntax.)
Changes that enabled this improvement are described in the
Oniguruma-To-ES
release notes for versions 0.3.0 and 0.4.0.Is this still experimental? Probably. But I thought I'd raise the question since, although it can continue to improve in various ways (supporting a few more grammars, improving performance, etc.), IMO the foundations for a strong JS engine are now in place. The strongest claim for it still being experimental is that we use
'loose'
accuracy instead of'default'
(allowing some inaccurate\G
handling). But unless you plan to not use'loose'
(and drop the count of supported grammars), there will always be some inaccurate use of\G
with some target strings.Additional context
Feel free to undo or improve any of my documentation edits!
I removed the
target: 'auto'
handling because it's now built intoOniguruma-To-ES
, and it auto-selectsES2025
for Node.js 23 and other compatible environments.I also removed the
simulation
option because its last remaining functionality was removed. I was wrong earlier when I said I might need to bring some of its former handling back.More context on
simulation
The
(^|\\\uFFFF)
to(^|\\G)
substitution it was doing was no longer having any effect on the number of supported grammars, but it actually was lowering the diff count somewhat for mismatched grammars. However, that slight improvement wasn't happening for good, predictable, or easily understandable reasons, which is why I'm removing it.Oniguruma-To-ES
, in'loose'
accuracy mode, handles many\G
anchors accurately but plays fast and loose with some others (likeoniguruma-to-js
did before it). As a result of this not-always-accurate handling, the effects of tweaking various things related to\G
is unpredictable in whether it will improve or hurt the diff count. E.g., not applyingOniguruma-To-ES
's advanced subclass-based\G
handling improves diff count a little when usingloose
accuracy (very unexpected), but it dramatically hurts results if usingdefault
orstrict
accuracy. And adding additional pre-substitutions for additional ways that\G
is used in patterns, like swapping(?!\\\uFFFF)
with(?!\\G)
, hurts the diff count. So, it's only this specific case of(^|\\\uFFFF)
that happens to slightly improve the numbers on-balance for non-obvious reasons related to which specific patterns are made sticky, etc.However, I think we want to not focus too much on hyper-optimizing the diff count for specific samples with grammars that are mismatched anyway, and instead make things as predictable and easy to reason about as possible, especially now that the numbers are very solid without hacks like this.
What's actually going on with
\\\uFFFF
?After investigating more what's going on in
vscode-textmate
, it's replacing\A
and\G
anchors with\<literal \uFFFF>
when it wants\A
and\G
anchors to fail to match (for a couple different reasons), so it's correct to respect this and NOT bring some of these\G
s back. Thevscode-textmate
authors should have used something like(?!)
to be more explicit and not have edge cases where it inappropriately does match certain strings, but no matter. In general, all of the various magical pattern substitutions invscode-textmate
(for\A
,\G
,\z
, and backreferences) are implemented in hacky and flawed ways, e.g. not correctly accounting for things like escaped backslashes, enclosed numbered backrefs, etc. 😓