Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser): offset internal index locations by startIndex #16936

Conversation

DylanPiercey
Copy link
Contributor

@DylanPiercey DylanPiercey commented Oct 25, 2024

#16849 introduced a regression (SORRY!)

The regression is that after the change all index properties in node correctly reflect the offset, however many places in the parser use node indexes interchangeably with the current parser index.

I have tried to painstakingly review every piece of code in the parser which is directly or indirectly comparing source indexes to an offset index and vice versa and update those locations to map the index accordingly.

The original downstream issue I noticed from this was that the raw property on some ast nodes was incorrect since it was slicing the input string based on the node indexes (which are now offset based on startIndex/startColumn).

Q                       A
Fixed Issues? Fixes #1, Fixes #2
Patch: Bug Fix? 👍
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass?
Documentation PR Link
Any Dependency Changes?
License MIT

Note this probably needs more tests... But I don't have the bandwidth right now.

It may make sense to revert #16849 (although that's somewhat trading one bug for another 😢)

@babel-bot
Copy link
Collaborator

babel-bot commented Oct 25, 2024

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/58269

@DylanPiercey DylanPiercey force-pushed the fix-slicing-input-without-start-index-offset branch from 63c4dd3 to b0825c4 Compare October 25, 2024 18:23
@liuxingbaoyu
Copy link
Member

Don't worry, this seems to be just a bug with the new option rather than a regression?

@DylanPiercey
Copy link
Contributor Author

DylanPiercey commented Oct 25, 2024

@liuxingbaoyu the regression is that this happens even if you pass startColumn (which existed previously).
The assumption in the previous changes is that updating the indexes of the nodes to be relative to startColumn was a fix (which i'd say it still is) but the problem is the parser is relying on the indexes being the original source indexes and not the offset ones.

This is because startIndex is default to startColumn if startLine is 1 (https://github.com/babel/babel/pull/16849/files#diff-3013601014749725fe2696156b2db73cf0342437721d18214a3a69734246512eR120)

@nicolo-ribaudo
Copy link
Member

Thanks for noticing the regression and quickly opening a PR :)

@liuxingbaoyu liuxingbaoyu added PR: Bug Fix 🐛 A type of pull request used for our changelog categories pkg: parser i: regression labels Oct 25, 2024
@DylanPiercey
Copy link
Contributor Author

@nicolo-ribaudo I'm happy to help, although this is a bit like the "spider man points at self" meme 😆.

One thing to note is that Marko is built on the babel compiler (and uses the startColumn api to process nested script code) which is where I saw the regression. With this regression it's going to break a lot of Marko apps unless I either pin the Babel dependency or switch the Marko compiler to avoid using the startColumn api. At one point we were using a hack of passing a empty string with newlines to babel to simulate the startColumn api however this ended up being terrible for performance when parsing some really large templates so we've been trying to switch.

All that is to say I need to do something on the Marko side quickly and I'm thinking the simplest thing is to avoid using startColumn for now.

Do you think this PR is something that could land quickly? I understand if not given lack of tests and such.
Maybe another option is removing the default here? https://github.com/babel/babel/pull/16849/files#diff-3013601014749725fe2696156b2db73cf0342437721d18214a3a69734246512eR120 (Although ideally this is all just fixed 😢)

Sorry to bug you all!

@liuxingbaoyu
Copy link
Member

image
I searched for .slice(, and it seems there are still some to deal with.
We are generally quick to land regression fixes, but I am worried that this does not fix all cases.

@DylanPiercey
Copy link
Contributor Author

DylanPiercey commented Oct 25, 2024

@liuxingbaoyu I did try to look for all references to this.input and also how indexes were compared.

In the case of the screenshot it is a non issue since chunkStart and this.state.pos are indexes relative to the input code (and not offset with startIndex).

@DylanPiercey DylanPiercey force-pushed the fix-slicing-input-without-start-index-offset branch from b0825c4 to 2c30427 Compare October 25, 2024 19:05
@DylanPiercey DylanPiercey force-pushed the fix-slicing-input-without-start-index-offset branch from 2c30427 to e8d74ca Compare October 25, 2024 19:17
return;
}
}

const difference = new Difference(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liuxingbaoyu for working on this test!

Below in the file, when we are fuzzing and using random line/col/index, test.original will be false and thus the if (!test.original) return will suppress any error. Should we replace it with

  if (!test.original) {
    const err = new Error();
    err.message = `Test Failed: ${testLocation}\nFixtureError.fromDifference: ${
      FixtureError.fromDifference(difference, actual).message
    }`;
    throw err;
  }

import { parse } from "../lib/index.js";
import { fileURLToPath } from "url";

runFixtureTests(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And no need for this file, we can just use the TEST_FUZZ=true env var we already have to test these cases.

@nicolo-ribaudo
Copy link
Member

I think once @liuxingbaoyu's tests are working, if they are passing this PR is good to land.

curPosition: () => Position;
curPosition: State["curPosition"];
sourceToOffsetPos: State["sourceToOffsetPos"];
offsetToSourcePos: State["offsetToSourcePos"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookaheadState should be kept minimal for performance. I guess we didn't call sourceToOffsetPos or offsetToSourcePos on a lookahead state, if that is true, then we don't have to include them in the lookahead state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JLHwung fyi without this the tests are failing and so it seems necessary with the changes in the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move those functions from State to the parser itself maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we inline sourceToOffsetPos in curPosition, the test passes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to move them.

actual.ast &&
expected.ast &&
!actual.ast.errors &&
!options.plugins?.includes("estree") &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ?. is causing the Node.js 12 failure.

@DylanPiercey
Copy link
Contributor Author

You all are awesome btw ❤️

@nicolo-ribaudo
Copy link
Member

Waiting for a ✔️ by @liuxingbaoyu (or even better, @JLHwung since @liuxingbaoyu wrote half of the code) and then I can release :)

Copy link
Contributor

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you.

@nicolo-ribaudo nicolo-ribaudo changed the title fix(babel-parser): all index locations from nodes need to be offset by the startIndex fix(parser): offset internal index locations by startIndex Oct 25, 2024
@nicolo-ribaudo nicolo-ribaudo merged commit 836528a into babel:main Oct 25, 2024
54 checks passed
@nicolo-ribaudo
Copy link
Member

Releasing now (https://github.com/babel/babel/actions/runs/11526141307)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i: regression pkg: parser PR: Bug Fix 🐛 A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants