Metadata makes it possible for others to find your research, understand if they can use it, and then reference your work in their publications.

Metadata is “data about data”. When publishing a research output, such as a dataset, metadata provides information that describes the “context, quality and condition” of the output.

To support the retrievability of specific outputs, inclusion in indexes and their correct citation metadata should be both machine and human readable (GO FAIR, 2022).

Currently, there are several metadata standards already available for different knowledge domains. The most common is the Dublin Core Metadata Initiative (DCMI).

To enable indexing of outputs and to use the metadata to “advertise” the output, it is important that metadata is in public domain (e.g. released under a CC-0 license).

Metadata and the FAIR principles

The FAIR principles provide clear guidance on the importance of metadata for describing data. Metadata is directly mentioned in several of the sub-principles.

Sub-Principle Description
F2 Data are described with rich metadata (defined by R1 below)
F3 Metadata clearly and explicitly include the identifier of the data they describe
A2 Metadata are accessible, even when the data are no longer available

Metadata are data, and so the remainder of the FAIR principles also apply to the metadata itself:

Sub-Principle Description
F1 (Meta)data are assigned a globally unique and persistent identifier
F4 (Meta)data are registered or indexed in a searchable resource
A1 (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
I1 (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2 (Meta)data use vocabularies that follow FAIR principles
I3 (Meta)data include qualified references to other (meta)data
R1 (Meta)data are richly described with a plurality of accurate and relevant attributes
R1.1 (Meta)data are released with a clear and accessible data usage license
R1.2 (Meta)data are associated with detailed provenance
R1.3 (Meta)data meet domain-relevant community standards

Applying Metadata to Research Outputs

Metadata can be stored in different ways.

Deposits such as Zenodo, provide a web-based user-interface to enter metadata prior to uploading and publishing research outputs.

Alternatively, metadata can be stored in text files in different formats.

At the simplest level, this can be performed manually by including a file containing common metadata fields alongside your research output. However, this is unlikely to meet the requirements of being machine readable, although it could be very useful for human users of your output.

A more comprehensive approach adopts existing metadata standards. For example, Zenodo-compatible metadata can be added to a Github repository in the form of a JSON file. If using Frictionless Data datapackages to manage data, then the metadata is also stored in a separate JSON file which contains machine-readable information not only about contributors, licenses and description, but information about the structure of the data, data types and relations between the data.

Further Reading

The Turing Way describes the importance of metadata to enable understanding of data, and provide evidence of provenance and context.


This material is derived from the CCG review of good enough practices, released under a CC-BY 4.0 license.