Scientific software can be published openly to allow others to reproduce your work, extend and build upon your code, and identify bugs or enhancements.

Research Software is defined in Gruenpeter et al. 2021 as:

“…source code files, algorithms, scripts, computational workflows and executables that were created during the research process or for a research purpose. Software components (e.g., operating systems, libraries, dependencies, packages, scripts, etc.) that are used for research but were not created during or with a clear research intent should be considered software in research and not Research Software. This differentiation may vary between disciplines.”

Chue Hong et al. (2022) have developed an application of the FAIR principles to research software. The principles are reproduced below.

Findable

Software, and its associated metadata, is easy for both humans and machines to find.

Sub-Principle Description
F1. Software is assigned a globally unique and persistent identifier.
F1.1. Components of the software representing levels of granularity are assigned distinct identifiers.
F1.2. Different versions of the software are assigned distinct identifiers.
F2. Software is described with rich metadata.
F3. Metadata clearly and explicitly include the identifier of the software they describe.
F4. Metadata are FAIR, searchable and indexable.

Accessible

Software, and its metadata, is retrievable via standardized protocols.

Sub-Principle Description
A1. Software is retrievable by its identifier using a standardized communications protocol.
A1.1. The protocol is open, free, and universally implementable.
A1.2. The protocol allows for an authentication and authorization procedure, where necessary.
A2. Metadata are accessible, even when the software is no longer available.

Interoperable

Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards.

Sub-Principle Description
I1. Software reads, writes and exchanges data in a way that meets domain-relevant community standards.
I2. Software includes qualified references to other objects.

Reproducible

Software is both usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software).

Sub-Principle Description
R1. Software is described with a plurality of accurate and relevant attributes.
R1.1. Software is given a clear and accessible license.
R1.2. Software is associated with detailed provenance.
R2. Software includes qualified references to other software.
R3. Software meets domain-relevant community standards.

You may be interested in the following relevant practices for scientific software.

Release

When developing software, the code is a living document which is under constant development. Because code under development is constantly changing, it can be confusing for those who wish to use the software to reproduce a specific result.

A software release is a specific instance or distribution of a collection of source code or a compiled programme. A software release points to a specific instance of the software. A software release can be archived, cited and used to support reproducible research. A software release supports many of the practices of Open Science.

Read more about the release practice

Version control

Many research teams that develop scientific software work with version control such as Git and Github. A version control system tracks all the changes that you make and allows to also revert back to older versions. The version control system also allows for collaborative work as each contributor/author can create an own “branch” and develop new code. The new code can be reviewed and compared to the other developments and “merged” to the main branch if the new development is accepted. When the code is ready and tested it can be released.

Read more about the version control practice

License

When releasing a software it is important to include a license from the very beginning. There are many different licenses that may be appropriate.

Find more about the license practice

Metadata

Metadata is “data about the data” meaning that it describes the context, quality and condition of a e.g. dataset. The motivation to use metadata in software are because they are machine readable, so the software release can be indexed (e.g. on PyPI, or Zenodo) which makes it findable. Metadata follow a different standard depending on where the software is released.

Find more about the metadata practice

Attribution

To make sure that the right people who has contributed to the software development are properly attributed one good practice is that the authors for each script can be added on the top of the file. If documentation software such as Sphinx is used then, if the metadata is added on the top of the file in a certain format, it will be added to the documentation file. If the code is collaboratively developed on Github then each author that commits to the main branch will be added as a contributor to the code. If the code is released the contributors to the Github repository will be added as contributors/authors to that release.

Read more about the attribution practice

Example of scientific software - PyPSA

The PyPSA-Eur: An Open Optimisation Model of the European Transmission System is a good example of how to manage scientific software. The software is a workflow is written in Python and executed using Snakemake. The source code is held under version control and publishing on Github under an open license. Each release is deposited on Zenodo where it is ascribed a DOI. In addition the software is documented in a short academic paper which is also archived as a pre-print in arxiv.


This material is derived from the CCG review of good enough practices, released under a CC-BY 4.0 license