The processing of identifiable data requires particular care. Data is identifiable if it can be used to identify an individual person or a cluster of persons, such as a family.
Identifiable data can be used in research and development activities when it is purposeful, planned, justified, and there is a legal basis for data processing, such as the participant's consent or research carried out in the public interest.
Personal data must be anonymised from the dataset as soon as they are no longer needed.
The content of this page is based on the Data Management Guidelines of the Finnish Social Science Data Archive (FSD). You can find Metropolia's privacy guidelines and templates on OMA intranet.
Indentifiable data includes any information that can directly or indirectly identify a person. Research or development data may also contain identifying information about the research subject's close associates or other individuals. Information that identifies them is also considered identifiable data.
Direct identifiers: a person's full name, social security number, email address containing the personal name, and biometric identifiers (fingerprints, facial image, voice patterns, iris scan, hand geometry or manual signature).
Strong indirect identifiers: e.g. a postal address, phone number, vehicle registration number, unusual job title, very rare disease or various unique identifiable codes, such as a student ID number.
Indirect identifiers: information that on its own is not enough to identify someone but, when linked with other available information, could be used to deduce the identity of a person. These include, for example, gender, age, education, occupational status, household composition, income, marital status, mother tongue, nationality, ethnic background, or place of work or study. When the target group of a study is relatively small, by combining indirect background information, an individual can be reasonably easily identifiable.
Sensitive personal data refers to the special categories of personal data defined by the General Data Protection Regulation (GDPR), which include information revealing:
Sensitive data must be protected with particular care, as their processing can pose risks to fundamental rights of individuals. Therefore, their processing is generally prohibited. However, there are exceptions to this prohibition, one of which is the explicit consent of the individual for processing such sensitive personal data.
Please note that storing special categories of personal data in cloud services is prohibited at Metropolia.
The principle of minimization is to avoid the collection of unnecessary identifiable data. This principle should be followed when planning research.
The processing of research data containing identifiers must be planned thoroughly and executed carefully. Data protection must not be jeopardised, for example, by careless preservation or insecure digital transfers.
General protective measures in processing personal data include pseudonymisation, anonymisation and storage limitation.
Pseudonymisation refers to the removal or replacement of identifiers with pseudonyms or codes, which are kept separate from the data and protected by technical and organisational measures. Organisational measures refer to the protection of physical environment and documented access control. Technical measures refer to secure data storage solutions. Pseudonymous data become anonymous when separately kept identifying information (decryption key, personal data and information on the techniques used to pseudonymise the data) is destroyed.
Data anonymisation refers to the process of handling data in a way that it no longer contains any identifiable information. In the case of personal data, this means that individuals cannot be reasonably identified from the dataset. Similarly, organizational information or other confidential data can also be anonymised from the dataset.
Even if you do not collect personal data directly from the research subjects, it may still be possible to identify them from the dataset. For example, an anonymous survey may not be truly anonymous if the research subjects can reveal information about themselves in open-ended responses or if the survey form records the respondent's IP address. Such data is not anonymous and is subject to data protection laws.
There is no single anonymisation technique suitable for all types of data. Anonymisation should always be planned case by case.
You can get a clear picture of the anonymisation process in both qualitative and quantitative research with the help of the following questions:
Techniques for anonymisation include, for example:
Personal data that are no longer needed to conduct the research should be erased as soon as possible. For example, names, addresses and other similar identifiers needed at the data collection stage should be removed immediately after they are no longer necessary to carry out the research. If personal identity codes were used to link data, they should also be deleted when they are no longer needed.
Metropolia Library and Information Services | Accessibility Statement