Metropolia LibGuides: Data management for thesis: When the Thesis is Completed

What happens to the data when the thesis is completed?

When a thesis is completed, there are a few actions that need to be taken regarding the research data. Firstly, it needs to be decided:

Whether the data or a part of it will be destroyed or preserved
Where the preserved data will be stored and whether it needs to be anonymised
Whether the data or a part of it can be made open access

The answers to these questions depend on agreements made with potential collaborators (such as a company or RDI project) regarding data ownership, usage rights, and preservation. They also depend on the consent given by the research participants.

Unless otherwise agreed upon, the student owns the research data of their thesis. In some cases, ownership of the data may be transferred to the collaborating organization, such as a company or Metropolia. The student may also grant usage rights to the data.

Data destruction

Unnecessary files should be destroyed when the thesis is completed. If your research involves human subjects, you will need to specify when and how the data will be destroyed after the research is finished in privacy notice.

If your research data contains personal information or other confidential data, these should be promptly destroyed once you no longer need them for your thesis, as they pose a privacy risk. Simply "deleting" the data and emptying the trash bin is not sufficient for data destruction. Follow the guidelines provided by the helpdesk for proper data destruction. Paper-based materials should be disposed of in a secure trash.

Opening or Reusing the Dataset

If you have collected a high-quality and interesting research dataset, you may want to reuse it yourself or offer it for others to use after completing your thesis. The success of reusing the dataset should be ensured in advance through:

agreements made with collaborators
research permits
informing the research participants
following good data management practices

Seeking permissions retroactively can prove to be impossible, so if you plan to reuse your dataset, take it into account already in your data management plan.

Checklist for Dataset Preservation

What has been agreed upon regarding ownership of the dataset? By default, you, as the thesis author, own the collected dataset. If there are multiple authors, you collectively own the dataset. Ownership can also be transferred through agreements, such as to a company or Metropolia.
What has been agreed upon regarding the preservation of the dataset? Even if you own the dataset solely, you can grant usage rights to a collaborating company or Metropolia. In this case, it is advisable to agree in advance on the duration of preservation, where the data will be stored, and who is responsible for it.
What consents have been obtained from the research participants? If you wish to preserve the dataset after your thesis has completed, and your research involves human subjects, obtain consent from the participants for the preservation and reuse of the data.
If the dataset contains personal information or other confidential data, such as trade secrets, the dataset should be anonymised. Anonymisation should be done as soon as you no longer need these details.

Opening the Dataset

Opening the dataset refers to making the research dataset freely available for others to use. This typically involves depositing the dataset in an open data repository. Opening the dataset can be facilitated by utilizing Creative Commons licenses, through which the dataset owner grants usage rights to the dataset.

Note that opening the dataset requires good data management practices from the very beginning. If you are interested in opening your dataset after completing your thesis, discuss this with your supervisor as early as possible. Opening the dataset should be considered in various aspects, including collaboration agreements, informing research participants, and anonymising any sensitive data.

Dataset Anonymisation

Dataset anonymisation refers to the process of processing the dataset in a way that it no longer contains any identifiable information. In the case of personal data, individuals cannot be reasonably identified from the dataset. Organizational information or other confidential data can also be anonymised from the dataset.

Even if you do not directly collect personal data from the research participants, it may still be possible to identify them from the dataset. For example, an anonymous survey may not be truly anonymous if participants can reveal information about themselves in open-ended responses or if the survey form records the respondent's IP address. Such a dataset is not anonymous and is subject to data protection laws.

Techniques for anonymisation include:

Removal of single piece of information. Removed information can be marked in the dataset using placeholders such as [information removed].
Reclassification of data. For instance, if you have collected precise ages or occupations, you can replace them with age groups or occupation categories.
Fictitious names. If the dataset contains names, instead of removing them, you can replace them with fictional names.
Generalization. You can transform precise information into more general terms. For example, "AIDS" can be replaced with "disease," and "Metropolia" can be replaced with "university of applied sciences."

Anonymisation and personal data
Finnish Social Science Data Archive's guidelines for anonymising data