Description
Problem
It's possible a user will have a workflow like:
- write grammar code
- tree-sitter test
- scan through test failures for easy wins, fix those, then start looking for harder failures, and finally look for failures where there are actual ERROR messages mid-parse tree
This can be challenging with the way tree-sitter test
currently just dumps failures to stdout.
Expected behavior
It would be nice to have a command or flag like tree-sitter test --report
that instead builds an HTML report of the tests and pipes that to a browser with something like bcat
(or dumps to file that can be opened, though this is more friction)
I've begun writing an NPM package that could be added to a tree-sitter package as a devDep that I found extremely useful for ironing out bugs with my tree-sitter-unison
package. It's invoked with something like tree-sitter test 2>&1 | tree-sitter-test-report
where the report package parses the combination of stdout and stderr to generate a table of results.
(Attached is a screenshot of an ugly first draft I used the past few days that was very helpful to me
.)
However, this is limited to what test
sends to stdout and stderr. It'd be cool if you could expand a row, pass or failure, to see the underlying test including expected result.
I'm able to implement this for failures in my JS package, but since successes don't get sent to stdout, and the test
results don't include filenames that contain the tests, I can't get the expected output from file so cleanly in an external package.
Hence why I think it'd be nice to have in tree-sitter itself as part of the Rust code. I think it'd be a real boon to people developing grammars to have this for a comprehensive, easy-to-read overview of parsing tests.
Plan
I'm thinking this would be the ideal, but am hoping for feedback:
-
tree-sitter test --report
runs tests and dumps JSON data to stdout - this data contains the filename for a test suite + children, which are (presumably) a list of tests with a status (
pass
,fail
,skipped
,no_platform
(??)), and if there's a failure, the diff of expected and actual - corpus tests
- highlighting tests
- any other test types
Activity