A release process identifies a specific instance of a piece of continually updated software, data, tool or method and makes that instance findable and accessible.

By depositing the product of the release in an archive such as Zenodo, a provenance trail from the research output to the exact sources that contribute to that output is created. It also makes it very easy to communicate precisely about what sources of data, software, etc. you are using.

“Releasing” or “creating a release” is a practice that stems from the software development cycle and is facilitated by use of version control. However, you do not have to use version control to apply the principles to your own work.

In the following sections, we explain how you can use the practice of creating a release to meet the requirements of reproducible research.

Existing standards

In software development, a common standard to identify software releases is to use semantic versioning, where three numbers are separated by full-stops, and the numbers are incremented after major, minor and “patch” changes respectively. The advantage of using semantic versioning is that it becomes very easy to reference a specific version (e.g.v1.2.1) and determine that another version is newer (v1.2.2 and v1.10.3 are newer than v1.2.1), and to what extent the versions differ (v1.0 to v2.0 will include major “breaking” changes).

Releasing data

After incrementing the version number, you can archive your data set using Zenodo or another sector-specific deposit for data, such as energydata.info

As the dataset is updated, releases are made at regular intervals with previous versions of the data remaining publicly available under their earlier version numbers. This allows users to see how the data has changed over time, and allows users to adjust to major or structural changes in the dataset.

When using semantic versioning for data, it is important to document how the concepts of major, minor and patch changes apply to your dataset. For example, a major change may involve structural changes that are backward incompatible. A “patch” change would involve updating numbers in a table or file.

Releasing other material

There are examples of version control, tags and releases applied to documents, such as data standards and ontologies 6 to clarify the development process and enable distributed collaboration which suit these technical documents.

Once you are comfortable with the concept of releases, you can apply a release process to any material in a similar way as you can for data and software.

Some examples:

  • a diagram or image can be tracked under version control and periodically released using the same process as for software
  • a journal article can tracked under version control and worked on collaboratively using Github. v1.0 is submitted for review and to a pre-print archive. The revised article is v2.0, and some final small edits result in v2.1 being accepted for publication. Some typos fixed in the version of the article in the pre-print archive result in v1.1
  • lecture material is worked on collaboratively. Last year, v1.0.0 of the lecture was published in OpenLearnCreate. This year, the material was updated and v1.1 was published, only for an image to be replaced in one of the slides at the last minute (v1.1.1). Next year, the material will be completely revised (v2.0).

Releasing software

When developing scientific code, a typical development cycle is followed where iterative development phases follow a pattern:

  1. requirement gathering
  2. design
  3. implementation
  4. verification & validation

A release publishes each verified and validated version of the software. Software releases can be managed using only version control and, for example, Github Releases. First, create a new tag using semantic versioning. Then create a new release on Github based on that tag.

More advanced software development will incorporate continuous testing which is triggered whenever code is changed, and continuous deployment which is triggered whenever code is tagged and a release is created.

It is also good practice to archive your software using Zenodo. An automatic link between Github and Zenodo deposits the source code on Zenodo upon every release.

Releasing software, an advanced example

The Python package smif uses the following procedure for handling new releases [edited for clarity]. The source code is held under version control using Git and published on Github. The Github issue tracker is used to track the release process. The issue is closed upon successful completion of all the steps. An issue template is used to ensure the same procedure is followed for each release.

Release step Description
create a release branch by creating a branch to track a release, the release can be prepared independently of the main codebase
update CHANGELOG file with notes on features and fixes since last release. The changelog contains a summary of all the changes made since the previous release
update AUTHORS file A list of all the contributors to the software
create and push an annotated tag A “tag” uniquely identifies the version of the code which forms the release. Semantic versioning is used for the tag name. The annotation is a brief description of the release.
open a pull request The pull request offers an opportunity to review and field comments from contributors. It also triggers automated tests, and if these pass, automated deployment of the release to the package index
wait (~15 minutes) for tests to pass The tests affirm code quality and that the release will be ready for production (i.e. it will work)
check that deploy stage ran as expected, new version available on https://pypi.org/project/smif/#history The released software is available for installation on the Python Packaging Index
create a GitHub release for the tag A Github release triggers the release to be archived on Zenodo

See the complete notes in the smif documentation.


This material is derived from the CCG review of good enough practices which is released under a CC-BY 4.0 license.