The documentation and description of research data in a thesis often refer to the same concept, which involves describing how the research data was produced, what it contains, and how it has been processed.
Metadata or descriptive information refers to data about data, usually in a standardized format, regarding the research data. These terms are closely related and sometimes used interchangeably. However, for the purpose of a thesis, what matters more than the differences between these concepts is the intention behind documentation and metadata, and what they aim to achieve.
Without sufficient detailed information about the context, research data are often useless. Let's consider, for example, measurement results stored in an Excel spreadsheet. If there is no information about what was measured, with what instruments, and on what scale, the data becomes mere numbers in a table, making it impossible to understand and interpret. Description brings forth this contextual information.
Therefore, describing data is an essential part of responsible data management because without it, the data can be difficult or even impossible to interpret.
Think in advance about the information that needs to be recorded for each dataset (e.g., interview data or measurement dataset) to make the data understandable. It is important that the data remains comprehensible even after a long period of time or when examined by someone other than yourself. The possibility of misinterpretation makes the data unreliable as it affects the results obtained from it.
The thesis supervisor should also be able to understand the research data.
Description and documentation is also significant for result verification and research reproducibility. It helps confirm that the results of the thesis are reliable and could potentially be replicated if necessary.
Sufficient descriptive information and documentation is particularly important if you are interested in making the data open access or using it for further purposes after completing the thesis.
Description is always specific to the dataset, as its purpose is to make the data understandable and correctly interpreted. The following levels of description are examples of the information that should be recorded.
Thesis-level description refers to basic information about the research conducted in the thesis, such as:
File-level description refers to the description of individual files, marking down their characteristics. Its purpose is to facilitate finding the correct information and maintain file integrity. If there are only a few separate files, file-level description may not be necessary in a thesis. Examples of file-level description information include:
Variable-level description refers to the description of variables in the dataset. In addition to a list of variables and the measurement scale, it is also good to note any used notations, abbreviations, and codes.
To ensure the reliability and integrity of the data, it is also important to document how the data has been processed and modified.
In addition to the above, contextual information or paradata can also be recorded if they are relevant to the dataset.
Contextual information refers to data about external conditions that prevailed during data collection and could potentially affect the data. These may include societal events, natural disasters, or accidents.
Paradata, on the other hand, is empirical data about the data collection process itself. For example, it could include the duration of different parts of an interview, response delays, or visual observations made by the interviewer during the interview situation.
There are several options for storing description data. For example:
Metropolia Library and Information Services | Accessibility Statement