Choose an open license for a dataset
There are many good resources on how to license data.
The u4RIA forum provides good information on the issues relating to licensing data.
The overall process advice can be summarised as follows:
- Ensure that you own or have permission to publish the data
- Select the appropriate license
- Include a copy of the license with the data, ideally both in the form of computer-readable metadata and in any accompanying documentation
Permission
Before publishing data, it is important to establish who owns the data and that they fully understand the implications of releasing the data under an open-license. Many data owners feel uncomfortable with releasing data under an open license such as CC-BY-4.0. However, what often happens in this instance is that data is shared with no license whatsoever which ultimately hinders re-use but also opens up the data owner to legal challenge. An open-license also protects the data owner, through limiting liability and disclaiming warranties.
Selecting a license
The choice of license is straightforward as just two licenses are recommended. These are:
- CC-BY-4.0 for datasets
- CC0-1.0 for any associated metadata
Note that these licenses are not suitable for code or software which may accompany or be associated with the dataset. Software and code should be licensed separately.
Publishing with a license
When you deposit your data in an archive, or share the data with colleagues or externally, always include a copy of the license text as a separate text file, together with a link to the original license text (e.g. CC-BY-4.0).
There are normally various options for ensuring that the license is also described in any metadata.