Version Control Systems record a history of changes to source code, data, or other text-based material enabling collaboration between distributed teams of researchers.

Version control systems, such as Git, Mercurial and Subversion, allow researchers to switch between these points in the development history of the material that is being tracked, view the differences between these points, and create new modifications from any point in the history. In short, version control provides “infinite levels of undo”.

The history of changes is stored in a “repository”. A repository can be navigated using a command-line tool, or a graphical-user interface.

Version control enables collaborative workflows in either a centralised or distributed fashion, which can effectively support scientific work. Conflicts between changes can be resolved in a systematic fashion because each change (or “commit”) to content tracked by the version control system is recorded with associated metadata that records the time, date, author, and a commit message. This gives collaborators the freedom to work in parallel to one another, make changes as they want and then merge their changes when they are ready.

The use of version control supports reproducibility and transparency through the consistent and structured workflow necessary for working with a VCS. For example, messages associated with changes to the code [“commits”] provide a running commentary on the development of the work itself. Version control allows entire repositories to be easily shared, facilitating the publishing of the work online, through websites such as Github and Gitlab.

Applying version control to software

Version Control is now seen as a cornerstone of reproducible computational research. Recognising that even small changes to computer code can have large unintended consequences on the output of scientific software, version control provides a structured and convenient means to record which version of scientific software was used for a study. The ability to easily switch between code versions supports systematic approaches to debugging, and the reproduction of results from different points in the project (Sandve et al., 2013).

Applying version control to data

Most version control systems are designed for management of source code, which is text-based.

If data is stored as text, for example in comma-separated value files (csv), or as a Frictionless Data Package, then that data can be placed under version control. This provides all the benefits listed above, in terms of enabling collaboration and providing a systematic process for updating and versioning the data. Applying version control to data contributes to the findable, accessible and reproducible aspects of the FAIR principles.

There are also a number of tools that allow binary data to be placed under version control including Git-LFS, which replaces large files with text pointers to the files stored on a server, and DVC, which was built for data scientists and machine learning projects.

Applying version control to other material

Version control can be applied to any text-based material. A version control repository can contain a mixture of text-based material including computer code, data and documentation. For example, some researchers use version control to facilitate collaboration on journal articles. Software Carpentry use Github to host all of their teaching material. Contributors use the collaboration features provided by the Git version control system, and Github website to develop, organise and update teaching material. The online teaching material “source code” is written in Markdown, a mark-up language that allows contributors to style a web-page using simple syntax. New ideas are submitted using the issue tracker where they are discussed and prioritised. Contributions are submitted via a pull-request and reviewed by the administration team, and finally merged into the main branch which holds the latest version of the material. This material is then rendered on the respective course website.

Further Reading and Next Steps

The Turing Way has a section on version control. To learn version control, there are many online courses available. Coderefinery provide an “Intro to Git”. Software Carpentry also have a similar course “Version Control with Git”. Their materials are free to view and use, and they also run in-person and remote training.


This material is derived from the CCG review of good enough practices which is released under a CC-BY 4.0 license.