Data documentation and description often refer to the same concept, which involves providing information on how research or development data has been generated, what it contains, and how it has been processed. Metadata is data about the data, typically presented in a standardized format, providing details about the dataset. These terms are closely related and are sometimes used interchangeably as synonyms.
Without sufficiently detailed contextual information, research datasets are often useless. Consider measurement results stored in an Excel spreadsheet. Without any information about what was measured, the instruments used, and the scale of measurement, the dataset consists of mere numbers in a table, making it impossible to understand and interpret. Description brings forth this contextual information.
The project manager ensures that the descriptive information of the collected data in the project is stored appropriately.
Data description is important because:
Memory is short: As time passes, the details of data collection and processing are forgotten unless documented. Recording this information prevents loss of valuable knowledge.
Proper interpretation by others: Data may be handled by multiple individuals, or project managers may change. The possibility of misinterpretation makes the data unreliable. Descriptive information allows others to correctly understand and interpret the data.
Verification and reproducibility of results: Derived results should be verifiable and reproducible. Proper data description enables researchers to validate findings and reproduce analyses.
High-quality description for open or secondary use: If data is intended to be opened or reused, the description needs to be of high quality. Comprehensive and well-documented descriptions increase the reliability and usefulness of the data for other researchers or projects.
In summary, data description is crucial for preserving the context, ensuring proper interpretation, enabling verification and reproducibility, and facilitating the open and secondary use of data.
There is no one-size-fits-all approach to data description, as the information to be described should be selected based on the specific dataset.
Important information to be documented includes:
Different research fields may have their own metadata standards and formats, which can be used if appropriate standards are available. Alternatively, you can utilize the data description template provided below, adapting it to your specific dataset. For large datasets, it is recommended to use machine-readable standards.
You can find metadata standards for different fields on the following websites:
When describing research data, focus on the data itself, not on conclusions or publications made based on it. Describe the data during the project, not only as it is ending. Aim to provide such an accurate description of the data that a person unfamiliar with the data can understand what it is about. The more accurate the description of data, the better the quality of metadata.
Also save the following as a TXT of PDF file in the same directory with the data and metadata:
​Save all language versions.
Save metadata e.g. as a text file (TXT) with the data according to the following model.
Name of data: Name your data as descriptively as possible.
Field of science: Select the correct field from the Fields of science 2010 classification on the Statistics Finland website.
Data creators: The individuals who are responsible for the content of the data, usually the project managers [Name, email, organisation and unit].
Other data collectors, storers and processors: Name and organisation.
Funders: What bodies or organisations have taken part in producing the data as project funders.
Original purpose: Information on the project for which the data were collected, the theoretical framework and the operationalisations of concepts under study.
Time of data collection: Starting and ending dates.
What data is collected and how: Describe the collected data as informatively as possible and provide the method used in data collection. Describe e.g. the data population, i.e. the group of people or things which were examined or which the results are based on.
Amount of data and description of files: Describe the amount of data, describe the data files as a directory (file name, format + what it contains)
Edits to the data: Describe how the data has been edited, e.g. accuracy of transcriptions, anonymisation, elimination of sections or variables, etc.
Produced publications: List the publications made based on the data.
Language: What is the language of the data.
Access rights: For open data, Metropolia recommends the Creative Commons Attribution 4.0 (CC BY 4.0) licence. Also log any special access rights and who provides further information, as necessary.
Data ownership: Who owns the data. In shared projects, data ownership is usually determined in the cooperation agreement.
Data location: Where the data is stored.
Subject headings: Describe your data using subject headings (3–5 pcs). Use the General Finnish Ontology YSO.
List of all data collected in the project: E.g. individual interviews, recordings of interactions, diary entries, field notes, newspaper clippings.
Information on each data unit: For a newspaper clipping, for example, the name of the paper, date, pages, author and title/topic. For interviews, this information would be the background information of the interviewee and other background information. Basic information on each data unit should be included in the units, e.g. at the start of the interview transcription, as well as in a separate list.
Source data: If the data are not collected through surveys or interviews, save information on data sources, e.g. books, articles and register information.
The following information should be documented on variables:
If the variables or the values of the variables are dissimilar to the questions or response alternatives in the questionnaire, these dissimilarities should be explained.
Changes and edits: Record information of any changes and edits made to the data during processing (e.g. removal of duplicates, removal of exceptional values). Some of the descriptive information can be documented in the data file itself.
Contextual information refers to the external circumstances and events that may have affected the units of observation at the time of data collection. For example, the economic situation, political events, various changes in the society as well as sudden natural disasters and accidents at the time of data collection may affect the responses of research participants.
Contextual information should be added to the metadata as required.
Metropolia Library and Information Services | Accessibility Statement