-
-
Notifications
You must be signed in to change notification settings - Fork 715
Description
Short Description
When scanning sometimes files are referenced, such as "see file LICENSE in this source tree."
The JSON output then adds the scan results for the file LICENSE as well (which is great) but it doesn't indicate that this was a file that was referenced. Adding something like
"is_referenced": "yes"
would make it a lot easier to find files that were referenced in the scan results, and also very easy to spot when inspecting the results, or to highlight in a UI.
Possible Labels
- new feature
Select Category
- Enhancement
- Add License/Copyright
- Scan Feature
- Packaging
- Documentation
- Expand Support
- Other
Describe the Update
An example is the following output from a scan of BusyBox:
"matches": [
{
"license_expression": "gpl-2.0-plus",
"license_expression_spdx": "GPL-2.0-or-later",
"from_file": "busybox-1.37.0/editors/awk.c",
"start_line": 7,
"end_line": 7,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 12,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-2.0-plus_544.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_544.RULE"
},
This rule has a list of referenced_filenames, for which the scan results are included:
{
"license_expression": "gpl-2.0",
"license_expression_spdx": "GPL-2.0-only",
"from_file": "busybox-1.37.0/LICENSE",
"start_line": 1,
"end_line": 6,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 45,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-2.0_only2.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0_only2.RULE"
},
{
"license_expression": "gpl-2.0",
"license_expression_spdx": "GPL-2.0-only",
"from_file": "busybox-1.37.0/LICENSE",
"start_line": 9,
"end_line": 348,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 2931,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-2.0_1022.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0_1022.RULE"
}
],
When trying to find out what the referenced files are, I would first have to walk all the matches, extract from_file, check if this matches the path and if not, then it is a referenced file name. I would rather see something like a tag that says is_referenced or something similar, as that makes it easier to walk the results, plus it is a lot more logical from a programming point of view ("walk the results, report all the file that have is_referenced"). Even a simple grep (which I use often on scancode results) would now be possible.
How This Feature will help you/your organization
Quicker filtering and processing of interesting results.
Possible Solution/Implementation Details
Add is_referenced or something like it should be a straightforward solution.