Skip to content

Add "is_referenced" to files that were referenced (such as LICENSE) #4858

@armijnhemel

Description

@armijnhemel

Short Description

When scanning sometimes files are referenced, such as "see file LICENSE in this source tree."

The JSON output then adds the scan results for the file LICENSE as well (which is great) but it doesn't indicate that this was a file that was referenced. Adding something like

"is_referenced": "yes"

would make it a lot easier to find files that were referenced in the scan results, and also very easy to spot when inspecting the results, or to highlight in a UI.

Possible Labels

  • new feature

Select Category

  • Enhancement
  • Add License/Copyright
  • Scan Feature
  • Packaging
  • Documentation
  • Expand Support
  • Other

Describe the Update

An example is the following output from a scan of BusyBox:

          "matches": [
            {
              "license_expression": "gpl-2.0-plus",
              "license_expression_spdx": "GPL-2.0-or-later",
              "from_file": "busybox-1.37.0/editors/awk.c",
              "start_line": 7,
              "end_line": 7,
              "matcher": "2-aho",
              "score": 100.0,
              "matched_length": 12,
              "match_coverage": 100.0,
              "rule_relevance": 100,
              "rule_identifier": "gpl-2.0-plus_544.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0-plus_544.RULE"
            },

This rule has a list of referenced_filenames, for which the scan results are included:

            {
              "license_expression": "gpl-2.0",
              "license_expression_spdx": "GPL-2.0-only",
              "from_file": "busybox-1.37.0/LICENSE",
              "start_line": 1,
              "end_line": 6,
              "matcher": "2-aho",
              "score": 100.0,
              "matched_length": 45,
              "match_coverage": 100.0,
              "rule_relevance": 100,
              "rule_identifier": "gpl-2.0_only2.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0_only2.RULE"
            },
            {
              "license_expression": "gpl-2.0",
              "license_expression_spdx": "GPL-2.0-only",
              "from_file": "busybox-1.37.0/LICENSE",
              "start_line": 9,
              "end_line": 348,
              "matcher": "2-aho",
              "score": 100.0,
              "matched_length": 2931,
              "match_coverage": 100.0,
              "rule_relevance": 100,
              "rule_identifier": "gpl-2.0_1022.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0_1022.RULE"
            }
          ],

When trying to find out what the referenced files are, I would first have to walk all the matches, extract from_file, check if this matches the path and if not, then it is a referenced file name. I would rather see something like a tag that says is_referenced or something similar, as that makes it easier to walk the results, plus it is a lot more logical from a programming point of view ("walk the results, report all the file that have is_referenced"). Even a simple grep (which I use often on scancode results) would now be possible.

How This Feature will help you/your organization

Quicker filtering and processing of interesting results.

Possible Solution/Implementation Details

Add is_referenced or something like it should be a straightforward solution.

Example/Links if Any

Can you help with this Feature

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions