Skip to Main Content

Library Support Services for RDI

Data description, documentation and metadata

Data documentation and description often refer to the same concept, which involves providing information on how research or development data has been generated, what it contains, and how it has been processed. Metadata is data about the data, typically presented in a standardized format, providing details about the dataset. These terms are closely related and are sometimes used interchangeably as synonyms.

Why is data described?

Without sufficiently detailed contextual information, research datasets are often useless. Consider measurement results stored in an Excel spreadsheet. Without any information about what was measured, the instruments used, and the scale of measurement, the dataset consists of mere numbers in a table, making it impossible to understand and interpret. Description brings forth this contextual information.

The project manager ensures that the descriptive information of the collected data in the project is stored appropriately.

Data description is important because:

  1. Memory is short: As time passes, the details of data collection and processing are forgotten unless documented. Recording this information prevents loss of valuable knowledge.

  2. Proper interpretation by others: Data may be handled by multiple individuals, or project managers may change. The possibility of misinterpretation makes the data unreliable. Descriptive information allows others to correctly understand and interpret the data.

  3. Verification and reproducibility of results: Derived results should be verifiable and reproducible. Proper data description enables researchers to validate findings and reproduce analyses.

  4. High-quality description for open or secondary use: If data is intended to be opened or reused, the description needs to be of high quality. Comprehensive and well-documented descriptions increase the reliability and usefulness of the data for other researchers or projects.

In summary, data description is crucial for preserving the context, ensuring proper interpretation, enabling verification and reproducibility, and facilitating the open and secondary use of data.

What information is described?

There is no one-size-fits-all approach to data description, as the information to be described should be selected based on the specific dataset.

Important information to be documented includes:

  1. Project level: The purpose of the data collection, the methodology used, details about the data collection process (who, where, when, and with what tools), and the access and terms of use.
  2. File level: File properties such as format, size, and name, connections between files (e.g., different versions), how files are organized in folders, and how folders are structured and named. Describing files is part of file management, which aims to facilitate finding the correct information and maintaining file integrity.
  3. Variable level: A list of variables with their descriptions, measurement scales, used labels, abbreviations, and codes. It is also important to document how the data has been processed and modified to ensure reliability and data integrity.

Different research fields may have their own metadata standards and formats, which can be used if appropriate standards are available. Alternatively, you can utilize the data description template provided below, adapting it to your specific dataset. For large datasets, it is recommended to use machine-readable standards.

You can find metadata standards for different fields on the following websites:

Data description model

When describing research data, focus on the data itself, not on conclusions or publications made based on it. Describe the data during the project, not only as it is ending. Aim to provide such an accurate description of the data that a person unfamiliar with the data can understand what it is about. The more accurate the description of data, the better the quality of metadata.

Also save the following as a TXT of PDF file in the same directory with the data and metadata:

  • instructions and other documents provided to the people collecting and processing data
  • invitations to write, cover letters
  • interview questions, questionnaires, interview frameworks

​Save all language versions.

Save metadata e.g. as a text file (TXT) with the data according to the following model.

Basic information

Name of data: Name your data as descriptively as possible.

Field of science: Select the correct field from the Fields of science 2010 classification on the Statistics Finland website.

Data creators: The individuals who are responsible for the content of the data, usually the project managers [Name, email, organisation and unit].

Other data collectors, storers and processors: Name and organisation.

Funders: What bodies or organisations have taken part in producing the data as project funders.

Data

Original purpose: Information on the project for which the data were collected, the theoretical framework and the operationalisations of concepts under study.

Time of data collection: Starting and ending dates.

What data is collected and how: Describe the collected data as informatively as possible and provide the method used in data collection. Describe e.g. the data population, i.e. the group of people or things which were examined or which the results are based on.

Amount of data and description of files: Describe the amount of data, describe the data files as a directory (file name, format + what it contains)

Edits to the data: Describe how the data has been edited, e.g. accuracy of transcriptions, anonymisation, elimination of sections or variables, etc.

Produced publications: List the publications made based on the data.

Language: What is the language of the data.

Access rights: For open data, Metropolia recommends the Creative Commons Attribution 4.0 (CC BY 4.0) licence. Also log any special access rights and who provides further information, as necessary.

Data ownership: Who owns the data. In shared projects, data ownership is usually determined in the cooperation agreement.

Data location: Where the data is stored.

Subject headings: Describe your data using subject headings (3–5 pcs). Use the General Finnish Ontology YSO.

Qualitative data: description of data unit

List of all data collected in the project: E.g. individual interviews, recordings of interactions, diary entries, field notes, newspaper clippings.

Information on each data unit: For a newspaper clipping, for example, the name of the paper, date, pages, author and title/topic. For interviews, this information would be the background information of the interviewee and other background information. Basic information on each data unit should be included in the units, e.g. at the start of the interview transcription, as well as in a separate list.

Source data: If the data are not collected through surveys or interviews, save information on data sources, e.g. books, articles and register information.

Quantitative data: description of data unit

The following information should be documented on variables:

  • number of variables and units of observation
  • list of variables with the name and label of each variable as well as its location in the file and its values and value labels
  • frequency distribution of each variable
  • information on the classifications used, e.g. "main categories of the ISCO-88 were used in the occupational classification" or "country codes: 3-digit ISO 3166"
  • meanings of abbreviations used
  • codings for missing data
  • information on constructed variables (e.g. how the weight variables and sum variables were calculated)
  • recoding and standardising of variables
  • data protection measures taken

If the variables or the values of the variables are dissimilar to the questions or response alternatives in the questionnaire, these dissimilarities should be explained.

Changes and edits: Record information of any changes and edits made to the data during processing (e.g. removal of duplicates, removal of exceptional values). Some of the descriptive information can be documented in the data file itself.

Contextual information

Contextual information refers to the external circumstances and events that may have affected the units of observation at the time of data collection. For example, the economic situation, political events, various changes in the society as well as sudden natural disasters and accidents at the time of data collection may affect the responses of research participants. 

Contextual information should be added to the metadata as required.

Metropolia Library and Information Services | Accessibility Statement