Revert comment attachment to 1.2 behavior? #1024

mikesherov · 2015-02-12T21:39:02Z

Moving the discussion from https://code.google.com/p/esprima/issues/detail?id=607

To summarize, even if the comment attachment algorithm is less ambiguous in 2.0, it was more useful in 1.2 to the library authors who were using it, and was a non BC change for eslint.

I believe that we should revert to the 1.2 behavior, and revisit the change in behavior for 3.0.0, and discuss the design decision further as a team.

@ariya @jeffmo thoughts?

ariya · 2015-02-12T21:57:49Z

I think first we need to gather some pros and cons of each comment attachment strategy.

jeffmo · 2015-02-12T22:06:06Z

I think to be totally general, we'd have to take cases on a case-by-case basis, make a "reasonable" decision, document the decision, and have a regression test to make sure the parser stays consistent.

Attaching comments to nodes does indeed get tricky in some circumstances. The purist in me is inclined to suggest that we don't even try to do this in the parser because it's possible to define different intent for all sorts of use cases. Some people will prefer absolutist consistency (closest, lowest node in the tree), and others will prefer "sanely most reasonable [for my particular use case]". I don't know that we can build a general solution into the parser that works well for all possible scenarios.

To copy/paste the examples in the previous issue here just for reference locality:

var o = {
   /** Desc */
   foo: function(){}
}

Should the comment attach to the Property.key node? or the Property node?

What if we change it up a little bit to be semantically the same, but syntactically slightly different?:

var o = {
  /** Desc */ foo:
    function() {}
}

Or there's the switch-case scenario as outlined in the previous google-code issue:

switch (x) {
  /*foo*/ case 42: /*bar*/ answer();
}

Does /*foo*/ belong to the entire SwitchCase node? Or the (SwitchCase, consequent) tuple? I would say probably the SwitchCase node...but some tooling may find it useful to see this information sit as a leaf of the tuple.

In conclusion: I'd propose that we should just publish a list of comments (each with location information) so that users of the data structure can decide on how those comments associate with nodes themselves.

I'd love to hear from @nzakas and others about how feasible this would be for their use cases (and if not feasible, what alternative they would propose? Or is tested-consistency of a potentially arbitrary decision on the corner-cases like SwitchCase and Property good enough?)

mikesherov · 2015-02-12T22:10:51Z

I think first we need to gather some pros and cons of each comment attachment strategy.

I agree. I think @nzakas laid it out fairly well here:

The problem with this approach is that it makes locating comments much more difficult in the AST. With the 1.2.x approach, given any node, I can find the nearest related comment simply by going to the parent, then grandparent, etc., until I find a node with leading comments. With the 2.x approach, for every ancestor node, I need to check the type to see if it's one of the types where the comment is attached on one of its children or not.

I'd argue the way it is in 1.2.x is more useful., and also that it makes more sense In your example, the comment applies to the whole property (including the value), not just the key.

@jeffmo there is already a comments array, and it's actually non-trivial to relate the two.

ariya · 2015-02-12T22:14:42Z

This is why I think we need to think through this carefully. The benefit of the "next smallest node" approach is because you can figure out the "next largest node" approach (bottom-top traversal). Doing it the other way around might not be easy.

I agree with @jeffmo that we shall consider also the different downstream tools, e.g. documentation tool from comment annotation, code coverage that might want to skip a certain node from the checks, etc.

fkling · 2015-02-12T22:20:22Z

cc @benjamn, seems to be relevant for (at least related to) recast (which already has a comment attachment strategy).

jeffmo · 2015-02-12T22:20:56Z

@jeffmo there is already a comments array, and it's actually non-trivial to relate the two.

Indeed -- however, this is what we've settled on internally (as an n=1 anecdote). We sort the comments spatially and, when we hit a node while traversing, we decide then and there which comment we'd prefer to analyze or use (depending on what the purpose of the traversal is). Some traversals are only looking for function docblocks, so they only care about docblocks near the property and can find the closest pre-fixed comment given either the Property node or the Property.key node.

My suggestion that maybe this just doesn't belong in the parser was an allusion to the fact that such tooling could exist outside it (in a shared way) to enable people to make use of the comments list in a way that suits them.

mikesherov · 2015-02-12T22:22:03Z

This is why I think we need to think through this carefully.

That's why I reopened the issue for discussion, now that there is a large team to consider the ramifications of both ways.

mikesherov · 2015-02-12T22:23:30Z

My suggestion that maybe this just doesn't belong in the parser was an allusion to the fact that such tooling could exist outside it (in a shared way) to enable people to make use of the comments list in a way that suits them.

Indeed, this is what JSCS does. It does not use comment attachment and just relates spatially. @nzakas perhaps you can weigh in on why ESLint doesn't do it that way, other than the fact that it was there and the way you wanted it at the time?

jeffmo · 2015-02-12T22:23:59Z

Some traversal are things like ESLint, and they only care about a thing that's near either the start of the Property...whereas others care about comments near the Property.key. In a spatially-sorted structure of comments, people can find nearby comments without having to be compatible with where the parser arbitrarily decided to assign the comment.

mikesherov · 2015-02-12T22:27:03Z

@jeffmo that's totally true. In the case of comment attachment, perhaps there is just no good strategy, and we should stick with whatever one suits us.

HOWEVER. the difference is 4 lines... perhaps we can just provide a flag here so downstream can avoid double parse? https://github.com/jquery/esprima/pull/292/files

ariya · 2015-02-12T22:30:22Z

Post-parsing attachment is possible, estraverse is doing this. However, there is a performance penalty and hence there was a need to include it as part of the parsing step. Maybe it is time to revisit the extra overhead to see if it is either tolerated or fixable.

ariya · 2015-02-12T22:33:38Z

@mikesherov It's not easy to quantify those 4 lines, primarily because we don't have enough test coverage for comment attachment.

ariya · 2015-02-12T22:53:17Z

JFYI, the change in the comment attachment does not break Istanbul test suite.

ariya · 2015-02-12T22:56:16Z

Although I must say that Istanbul does it like JSCS (get the array and handle the comments spatially), as opposed to relying on the attached one(s).

mikesherov · 2015-02-12T23:05:00Z

@mikesherov It's not easy to quantify those 4 lines, primarily because we don't have enough test coverage for comment attachment.

@ariya, that's true. But then the task becomes: "provide a flag for both forms of comment attachment" with a dependency of "write more tests to cover both cases."

First we must consider what we want to do, then if it is feasible and how.

mikesherov · 2015-02-12T23:13:00Z

Because I would be willing to write a whole bunch of tests to do this if we can agree that we can move forward be creating a flag for which comment attachment behavior to use.

If we discover more complexities here, then we can reconsider, but we had it both ways in different stable versions of Esprima, I'm not sure why we can't provide a flag and mark comment attachment as experimental / unstable.

benjamn · 2015-02-12T23:58:48Z

Though I was the person who ported the master-branch comment attachment code to the harmony branch, in hopes of using it in Recast, I have not ended up using that feature of Esprima, for a couple of reasons:

I have my own comment attachment algorithm, which involves a binary search among siblings at each level of the tree, and has the nice property of taking less time if there are fewer comments.
There are fundamental ambiguities to comment attachment, and I need precise control over how those ambiguities are resolved.

For my purposes, getting a list of comments (with .loc information) attached to the root Program node is really the most useful thing Esprima can do. Anything else is too opinionated to be usable.

To give an example of the ambiguity I'm talking about, I think dividing comments into leadingComments and trailingComments is a false distinction, since there are comments that are neither leading nor trailing, such as this one: [1, /*hole*/, 2]. What node would you attach that comment to? Maybe the ArrayExpression? But then how would you record that it appeared in the space between the two commas? It's tempting to consider reifying comments as AST nodes, but comments can appear in places where no node can appear, so that won't work either.

No matter how I approach this problem, I can't help thinking that a parser is not the kind of software that should be responsible for juggling a bunch of imperfect heuristics. If we can't solve a problem thoroughly and efficiently, we should stick to providing users of the library tools for attempting to solve the problem themselves.

Which is all to say: if there are people using the current comment attachment algorithm, let them keep using it. The last thing we need is to spend time supporting multiple, slightly different strategies.

nzakas · 2015-02-13T00:42:34Z

Sorry, trying to catch up on all the notes in here.

Why not do it in post-processing? I tried doing this when we tried using esprima-fb, and it turned into a nightmare. es-traverse is different from what esprima 1.2, so we couldn't just use that. I spent a lot of time trying to make this work and it took a lot of code, and edge cases just kept popping up. There's also a perf hit doing it in post-processing because it requires a whole extra traversal to deposit the comments in the right spot. One of the big perf bottlenecks for ESLint is how many times we have to do a traversal of the AST before we are done. When esprima added comment attachment, it gave us a nice boost and simplified a bunch of code.

Yeah, in theory, we could go back to using estraverse's comment attachment, and then we'd take a perf hit and need to change around our comment detection again. I'd be pretty dissatisfied with that outcome because besides affecting us, it could potentially affect the developers of third-party ESLint rules in the wild. So as someone trying to provide a stable environment on which those third-party rules can be created, I really need to have the 1.2 behavior one way or another.

nzakas · 2015-02-13T02:38:46Z

Follow-up: I dug into the code a bit and I'm reasonably convinced it's possible to extract the comment attachment code from Esprima and provide a hook for people to plug in their own function to do it. That could solve both the problem of not wanting the parser to own that functionality and my problems related to post processing.

All Esprima would have to do is keep extra.trailingComments as-is and pass extra to a function specified during parse(). If people are open to that, I can put together a POC.

mikesherov · 2015-02-13T03:05:29Z

@nzakas, that seems reasonable to me. @benjamn, @jeffmo, thoughts?

mikesherov · 2015-02-13T03:06:27Z

cc @ariya. Worth investigating, no? Fits in with longer term plan of exposing subparse functions?

michaelficarra · 2015-02-13T04:16:06Z

👍

ariya · 2015-02-13T04:21:53Z

Any information needed within that extra instance? That object is supposed to be internal only, exposing this will risk some internal refactoring that is necessary for simplifying the state.

ariya · 2015-02-13T04:26:06Z

@nzakas Do you recall the different perf numbers between built-in attached comment vs doing it as another post-processing pass? Perhaps the new algorithm from @benjamn will fit the use case and can be fast enough to re-evaluate the overhead numbers.

ariya · 2015-02-13T04:28:59Z

@mikesherov I'm not sure how this is related to exposing subparse function. Care to elaborate?

mikesherov · 2015-02-13T12:07:27Z

@ariya I just meant its tangentially related in the spirit of exposing functionality at the risk of losing refactor ability.

nzakas · 2015-02-13T16:31:19Z

I don't recall the perf numbers, but again, my main concern is stability of output right now.

I can see how much of extra is necessary to make this work. I know it's intended to be internal only, but so many functions rely on it that it's going to be hard to avoid exposing it if extensibility is a goal.

mikesherov · 2015-02-13T16:47:55Z

So, then perhaps we should close this for now, and just say that it was premature to move comment attachment not in a major release now that we're following semver, and that our "test downstream libs" issue will be the way we address this going forward. @nzakas cool?

ariya · 2015-02-13T19:24:13Z

Just for the sake completeness, let me write down another workaround I've mentioned to someone (don't remember whom, sorry).

It's always possible to "promote" the attached comment upwards, e.g. from Property.key to Property, SwitchCase.test to SwitchCase, etc on a case by case basis. It is annoying since yet another traversal is needed. However, it is still useful to "patch" the syntax tree like that whenever a deviation from the opinionated attachment is necessary.

Maybe this is completely useless, but I want to make sure that doing a full-blown attachment from the comment array is not the only possibility.

nzakas · 2015-02-13T23:49:31Z

@mikesherov I'm cool with that, maybe open a new issue for exploring externalizing comment attachment? I'll happily dig in on that and report back.

mikesherov · 2015-02-17T02:17:14Z

OK, so looks like we beat this horse dead. Thanks everyone for contributing to the discussion. @nzakas please open a new issue for externalizing comment attachment at your leisure.

mikesherov added defect harmony branch labels Feb 12, 2015

mikesherov closed this as completed Feb 17, 2015

mikesherov mentioned this issue Feb 17, 2015

Change where comments are attached back to 1.2 branch location. #292

Closed

benjamn mentioned this issue Feb 25, 2015

Support leadingComments and trailingComments, like Esprima? benjamn/recast#159

Closed

xmo-odoo mentioned this issue Jul 14, 2017

Add basic comments support to the AST PiotrDabkowski/pyjsparser#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert comment attachment to 1.2 behavior? #1024

Revert comment attachment to 1.2 behavior? #1024

mikesherov commented Feb 12, 2015

ariya commented Feb 12, 2015

jeffmo commented Feb 12, 2015

mikesherov commented Feb 12, 2015

ariya commented Feb 12, 2015

fkling commented Feb 12, 2015

jeffmo commented Feb 12, 2015

mikesherov commented Feb 12, 2015

mikesherov commented Feb 12, 2015

jeffmo commented Feb 12, 2015

mikesherov commented Feb 12, 2015

ariya commented Feb 12, 2015

ariya commented Feb 12, 2015

ariya commented Feb 12, 2015

ariya commented Feb 12, 2015

mikesherov commented Feb 12, 2015

mikesherov commented Feb 12, 2015

benjamn commented Feb 12, 2015

nzakas commented Feb 13, 2015

nzakas commented Feb 13, 2015

mikesherov commented Feb 13, 2015

mikesherov commented Feb 13, 2015

michaelficarra commented Feb 13, 2015

ariya commented Feb 13, 2015

ariya commented Feb 13, 2015

ariya commented Feb 13, 2015

mikesherov commented Feb 13, 2015

nzakas commented Feb 13, 2015

mikesherov commented Feb 13, 2015

ariya commented Feb 13, 2015

nzakas commented Feb 13, 2015

mikesherov commented Feb 17, 2015