Metrics for software

One of the main objectives of the FAIR-IMPACT project is to extend metrics for FAIR assessment to cover digital objects other than data. The original definition of the FAIR principles (Wilkinson et al., 2016) consider data, but further discussion has identified that software requires different considerations when it comes to theoretically and practically designing FAIR. In 2022, the FAIR principles for Research Software (FAIR4RS) were published (Chue Hong et al.),, which extended the original principles to more effectively cover software. After this initial design of the principles, FAIR-IMPACT now works on translating them into metrics and practical tests that can be automatically assessed using a tool.

Related to the topic of research software assessment, FAIR-IMPACT has also worked to create a set of Research Software Metadata Guidelines (RSMD) that offer flexible and adaptable recommendations for end-users that can be used in different disciplines and different software development contexts. The metrics presented below also indicate the links between each metric and the RSMD Guidelines

Domain-agnostic metrics for software assessment

According to their latest version (v1.0), the generic Metrics for automated FAIR software assessment are as follows:

FRSM-01 - Does the software have a globally unique and persistent identifier?

Description. A software object may be assigned with a globally unique identifier such that it can be referenced unambiguously by humans or machines. Globally unique means an identifier should be associated with only one resource at any time. Examples of unique identifiers of data used for software include: Digital Object Identifier (DOI), the Handle System, Uniform Resource Identifier (URI) such as URL and URN, and Software Heritage Identifiers (SWHID). A data repository may assign a globally unique identifier to your data or metadata when you publish and make it available through its curation service.

FAIR4RS Principle. F1: Software is assigned a globally unique and persistent identifier. R3: Software meets domain-relevant community standards.

RSMD Recommendation. RSMD-3.3

FRSM-02 - Do the different components of the software have their own identifiers?

Description. Conceptually, it is useful for identifiers to be assigned at a more granular level than just the software project (often synonymous with the “software concept” or “software project”). For instance a software product may consist of different modules, which in turn may be implemented by different files. This metric tests that these different components are not all assigned the same identifier, and that the relationship between components is embodied in the identifier metadata. 

FAIR4RS Principle. F1: Software is assigned a globally unique and persistent identifier. F1.1: Components of the software representing levels of granularity are assigned distinct identifiers.

RSMD recommendation. RSMD-3.2, RSMD-3.3, RSMD-3.5

FRSM-03 - Does each version of the software have a unique identifier?

Description. To make different versions of the same software (or component) findable, each version needs to be assigned a different identifier. The relationship between versions is embodied in the associated metadata.

FAIR4RS Principle. F1: Software is assigned a globally unique and persistent identifier. F1.2: Different versions of the software are assigned distinct identifiers. R3: Software meets domain-relevant community standards.

RSMD recommendation. RSMD-3.2, RSMD-3.3, RSMD-3.4

FRSM-04 - Does the software include descriptive metadata which helps define its purpose?

Description. Software requires descriptive metadata to support indexing, search and discoverability.

FAIR4RS Principle. F2: Software is described with rich metadata. R1: Software is described with a plurality of accurate and relevant attributes. R3: Software meets domain-relevant community standards.

RSMD recommendation. RSMD-1.1, RSMD-4.1, RSMD-4.2, RSMD-4.3, RSMD-4.4

FRSM-05 - Does the software include development metadata which helps define its status?

Description. Software requires descriptive metadata to support indexing, search and discoverability.

FAIR4RS Principle. F2: Software is described with rich metadata. R1: Software is described with a plurality of accurate and relevant attributes. R3: Software meets domain-relevant community standards.

RSMD recommendation. RSMD-4.2, RSMD-4.4, RSMD-4.5

FRSM-06 - Does the software include metadata about the contributors and their roles?

Description. Software should make it easy to recognise and credit all contributors.

FAIR4RS Principle. F2: Software is described with rich metadata. R3: Software meets domain-relevant community standards.

RSMD recommendation. RSMD-5.1, RSMD-5.2, RSMD-5.3, RSMD-5.4, RSMD-5.5, RSMD-5.6. RSMD-5.7. RSMD-5.8

FRSM-07 - Does the software metadata include the identifier for the software?

Description. Software should include its identifier to make it easier to be cited and indexed

FAIR4RS Principle. F3: Metadata clearly and explicitly include the identifier of the software they describe. R3: Software meets domain-relevant community standards.

RSMD recommendation. No related RSMD recommendation

FRSM-08 - Does the software have a publicly available, openly accessible and persistent metadata record?

Description. Even if the software itself is no longer usable or accessible, its metadata should still be available and accessible.

FAIR4RS Principle. F4: Metadata are FAIR, searchable and indexable. A2: Metadata are accessible, even when the software is no longer available. R3: Software meets domain-relevant community standards. May enable compliance to F1, F1.1, F1.2, F2, F3

RSMD recommendation. RSMD-1.2

FRSM-09 - Is the software developed in a code repository / forge that uses standard communications protocols?

Description. Software source code repositories / forges (a.k.a. version control platforms) should use standard communications protocols (such as https / sftp)  to enable the widest possible set of contributors.

FAIR4RS Principle. A1: Software is retrievable by its identifier using a standardised communications protocol. A1.1: The protocol is open, free, and universally implementable. A1.2: The protocol allows for an authentication and authorization procedure, where necessary. R3: Software meets domain-relevant community standards.

RSMD recommendation. RSMD-1.3

FRSM-10 - Are the formats used by the data consumed or produced by the software open and a reference provided to the format?

Description. The use of open file formats for data improves the reusability and understandability of the software.

FAIR4RS Principle. I1: Software reads, writes and exchanges data in a way that meets domain-relevant community standards. I2: Software includes qualified references to other objects.

RSMD recommendation. RSMD-7.6

FRSM-11 - Does the software use open APIs that support machine-readable interface definition?

Description. An open Application Programming Interface can be freely accessed by other software or developers, which makes it easier to integrate software and encourages modularity and reuse.

FAIR4RS Principle. I1: Software reads, writes and exchanges data in a way that meets domain-relevant community standards.

RSMD recommendation. No related RSMD recommendation.

FRSM-12 - Does the software provide references to other objects that support its use?

Description. Determining the usefulness of a piece of software is often aided by understanding what it is used with.

FAIR4RS Principle. I2: Software includes qualified references to other objects.

RSMD recommendation. RSMD-4.3, RSMD-7.6

FRSM-13 - Does the software describe what is required to use it?

Description. Software is made more reusable by providing suitable machine-actionable information on dependencies, build and configuration.

FAIR4RS Principle. R1: Software is described with a plurality of accurate and relevant attributes. R2: Software includes qualified references to other software.

RSMD recommendation. RSMD-7.1, RSMD-7.2, RSMD-7.3, RSMD-7.4, RSMD-7.5

FRSM-14 - Does the software come with test cases to demonstrate it is working?

Description. The provision of test cases improves confidence in the software.

FAIR4RS Principle. R1: Software is described with a plurality of accurate and relevant attributes.

RSMD recommendation. RSMD-7.5

FRSM-15 - Does the software source code include licensing information for the software and any bundled external software?

Description. Clear software licensing enables reuse.

FAIR4RS Principle. R1.1: Software is given a clear and accessible licence.

RSMD recommendation. RSMD-6.2, RSMD-6.4, RSMD-6.5, RSMD-6.6

FRSM-16 - Does the software metadata record include licensing information?

Description. It is important for licensing information to be on the publicly searchable and accessible metadata record.

FAIR4RS Principle. R1.1: Software is given a clear and accessible licence.

RSMD recommendation. RSMD-6.3

FRSM-17 - Does the software include provenance information that describe the development of the software?

Description. Good provenance metadata clarifies the origins and intent behind the development of the software, and establishes authenticity and trust. As a type of metadata this overlaps with the metadata called for in guiding principles F2 and F4.

FAIR4RS Principle. R1.2: Software is associated with detailed provenance.

RSMD recommendation. RSMD-4.5

Discipline-specific metrics for data assessment

Another objective of the FAIR-IMPACT project is to build upon the current assessment metrics and tailor them to specific disciplines. Through extensive analysis of the current community practices, some metrics have been identified that can be made specific to certain communities. 

For the social sciences community, the following discipline-specific metrics have been created in collaboration with CESSDA:

FRSM-01-CESSDA - Does the software have a globally unique and persistent identifier?

Comments. See the Software Publication of open source components as per CESSDA’s Publication Policy & Procedures.

As described in the CESSDA ERIC Persistent Identifier Policy, CESSDA tools and services accept: DOI, Handle (including ePIC-handles), URN, ARK (fulfilling principle 10 of the CESSDA Data Access Policy).

FRSM-02-CESSDA - Can different components of the software be individually identified?

Comments. CESSDA requirements for modularity are defined in CMA4: Modularity.

CESSDA’s products are designed and built using a microservices approach. It is expected that a separate Git repository is used for the source code of each component (aka microservice).

FRSM-03-CESSDA - Does each version of the software have a unique identifier?

Comments. These are derived from the CESSDA Software Publication policy and procedures for open source components, as set out in the CESSDA Publication Policy & Procedures.

FRSM-04-CESSDA - Does the software include descriptive metadata which helps define its purpose?

Comments. CESSDA technical guidelines on CMA1: Documentation define what is required from end-user documentation, operational documentation, and development documentation but these are not machine-accessible.

The CESSDA Software Requirements also demand that all tools and products have a comprehensive README.

FRSM-05-CESSDA - Does the software include development metadata which helps define its status?

Comments. Some of this metadata is machine readable but requires interpretation. For CESSDA, active status would be defined as there being a recent release (release date) and that it is maintained (recent commits).

FRSM-06-CESSDA - Does the software include metadata about the authors and their roles?

Comments. Authorship criteria should follow the CESSDA Publication Policy & Procedures. CESSDA uses Citation File Format for recording authorship, e.g. CDC-Searchkit citation.

FRSM-07-CESSDA - Does the software metadata include the identifier of the software?

Comments. The Zenodo DOI representing all versions will always resolve to the latest version in Zenodo.

CESSDA uses Citation File Format, which can include a reference to the software identifier.

FRSM-08-CESSDA - Does the software have a publicly available, openly accessible and persistent metadata record?

Comments. Software releases of open source components should be published on Zenodo, as per CESSDA’s Publication Policy & Procedures. Recommended metadata from the CESSDA Technical Guidelines on Software Publication include version, authors, name, description and identifier.

FRSM-09-CESSDA - Is the software developed in a code repository/forge that uses standard communication protocols?

Comments. Development of CESSDA tools and services is carried out using CESSDA-owned git-repositories on Github. If the code is developed publicly elsewhere, mirroring with clear pointers to the upstream are used.

FRSM-10-CESSDA - Are the data formats used by the software open and a reference provided to the format?

Comments. CESSDA documents its approach to open data standards in CMA7 - Standards Compliance.

FRSM-11-CESSDA - Does the software use open APIs that support machine-readable interface definition?

Comments. Expectations around the API definition and documentation are set out in the section on CMA1.3 Development Documentation of the CESSDA Technical Guidelines.  The section on CMA7 Demonstrate Usability notes that at SML5 (excellent standard) compliance with open or internationally recognised standards for the software and software development process, is evident and documented, and verified through testing of all components. At present, this is not being included in the assessment criteria as it is hard to automatically test, but could be independently verified through regular testing and certification from an independent group.

An extensible service enables additional services to be built on or around it, including adapting to changing functional requirements over time. This is done by making the integration point the API. New and/or existing services can be combined as required via their APIs to meet changing functional requirements. Versioning the APIs and supporting two versions simultaneously allows services to evolve, without breaking the contract they provide to their consumers.

FRSM-12-CESSDA - Does the software provide references to other objects that support its use?

Comments. CESSDA uses the “docs-as-code” approach for end user and content editor demonstration. Therefore, for this metric, it is hard for CESSDA tools and services to demonstrate compliance. Therefore, this metric is not useful to assess at present. At present, CESSDA does not require publications describing the software - if this changed, a suitable assessment for this metric would be to test the identifier for the publication to be included in the software metadata.

FRSM-13-CESSDA - Does the software describe what is required to use it?

Comments. See Software Maturity Levels (SML) for: CMA1 - Documentation, CMA3 - Extensibility, CM4 - Modularity, CMA5 - Packaging, CMA6 - Portability, and CMA7 - Standards Compliance.

Source code documentation should use the de facto standard for chosen language, e.g: JavaDoc for Java.  Although no language-specific coding conventions are mandated, the ‘Coding conventions for languages’ section of the Wikipedia Coding conventions page is a useful reference source for language-specific guidelines, if required.

FRSM-14-CESSDA - Does the software come with test cases to demonstrate it is working?

Comments. See Software Maturity Levels (SML) for: CMA9 - Verification and Testing and CMA7 Standards Compliance.

CESSDA periodically runs the SQAaaS tool against its publicly accessible repositories and displays the results via a badge in the README file.

FRSM-15-CESSDA - Does the software source code include licensing information for the software and any bundled external software?

Comments. CESSDA guidance on licence information is part of the guidelines on Standard Git Repository Contents, Further guidance is provided as part of the guidance on CMA2 - Intellectual Property.

FRSM-16-CESSDA - Does the software metadata record include licensing information?

Comments. CESSDA guidance on licence information is part of the guidelines on Standard Git Repository Contents.

FRSM-17-CESSDA - Does the software include provenance information?

Comments. Git repositories include a commit history as a matter of course. CESSDA uses git repos on GitHub, and uses a branching model where each branch is prefixed with the issue tracker ticket number that it addresses.

Further information

Community feedback

We are very happy to invite community feedback on this work. You can do this in different ways:

  • Commenting on this webpage. Please in your comment below, provide in the subject line the Metric Identifier No. you are referring to (e.g. FsF-R1.3-01M).
  • Providing direct comments on the full report. You can leave suggestions and comments on specific parts of deliverable D5.2 that presents the metrics.
  • Would you like more personal contact on the topic of metrics for data? Get in touch with us by sending an email to [email protected]

As you read and comment on these metrics please bear in mind the following: 

  • In the FAIR ecosystem, FAIR assessment must go beyond the object itself. FAIR enabling services and repositories are vital to ensure that research data objects remain FAIR over time. 
  • Automated testing depends on clear, machine-accessible criteria. Some aspects (rich, plurality, accurate, relevant) specified in FAIR principles still require human mediation and interpretation. 
  • Until domain/community-driven criteria such as schemas and usage elements have been agreed, the tests must focus on generally applicable data/metadata characteristics. 

Metrics for software can be assessed using automated assessment tools. You can find the tools that FAIR-IMPACT works with on the page here.


 

 

Leave a comment

Please feel free to leave us a comment to share your thoughts with the FAIR-IMPACT community.

Log in or register to post comments

There are no comments