Skip to Main Content
Edward G. Miner Library

Data Management: Documenting Data

This guide provides resources for managing and sharing your research data no matter the discipline.

What is Metadata?

Metadata provides the essential tools for discovery, such as a bibliographic citation, and reuse. It is important to add metadata tags to your data. Including keywords and phrases that describe your data and research ensure that other researchers can search and locate your data in a given repository.

Machine Learning Datasets and Models

The Data Cards Playbook is an emerging metadata standard focused on increasing transparency and providing structured documentation for machine learning datasets and models. The following resource provides a basic description: https://ai.googleblog.com/2022/11/the-data-cards-playbook-toolkit-for.html

Minimum Information About a Microarray Experiment (MIAME)

The MIAME Standard is widely used for gene expression data and can be adapted for spatial transcriptomics data.

Metadata Standards for Survey Data

  1. Quantitative survey data:

Data Documentation Initiative (DDI): A widely used international standard for describing data from social, behavioral, and economic sciences. DDI allows you to document and manage different stages of the data life cycle, from conceptualization to data distribution. 

  1. Qualitative narrative audio and transcripts:

Dublin Core Metadata Initiative (DCMI): Dublin Core is a versatile metadata standard that can be applied to diverse digital resources, including audio recordings and text transcripts.

  1. Qualitative Data Repository (QDR) offers a set of guidelines for documenting and managing qualitative data, including interview transcripts and notes. This can be combined with the DDI standard to create comprehensive metadata for mixed-methods research.

Citing Data: Tools

DataCite -- A service for data publishers to mint DOIs and register associated metadata

EndNote -- Use EndNote to manage your datasets along with citations

Section 3 Rubric: Standards for Data and Metadata

An indication of what standards will be applied to the scientific data and associated metadata (i.e., data formats, data dictionaries, data identifiers, definitions, unique identifiers, and other data documentation).  While many scientific fields have developed and adopted common data standards, others have not. In such cases, the Plan may indicate that no consensus data standards exist for the scientific data and metadata to be generated, preserved, and shared.
Performance level
Performance Criteria Complete/detailed Addressed issue, but incomplete Did not address
3.1 Identifies metadata standards and/or metadata formats that will used for the proposed project

The metadata standard that will be followed is clearly stated and described. If no disciplinary standard exists, a project specific approach is clearly described.

The metadata standard that will be followed is vaguely stated. If no disciplinary standard exists, a project-specific approach is vaguely described.

The metadata standard that will be followed is not stated and no project-specific approach is described.

3.2

Describes data formats created or used during project

Clearly describes data format standard(s) for the data.

Describes some but not all data formats, or data format standards for the data. Where standards do not exist, does not propose how this will be addressed.

Does not include information about data format standards.

3.3

Identifies data formats that will be used for storing data

Clearly describes data formats that will be used for storing data and explains rationale or complicating factors.

Only partially describes data formats that will be used for storing data and/or the rationale or complicating factors.

Does not describe data formats that will be used for storing data and does not explain rationale or complicating factors.

3.4

If the proposed project includes the use of unusual data formats, the plan discusses the proposed solution for converting data into more accessible formats

Explains how the data will be converted to a more accessible format or otherwise made available to interested parties. In general, solutions and remedies should be provided.

Vaguely explain[s] how the data may be converted to a more accessible format or otherwise made available to interested parties.

Does not explain how the data will be converted to a more accessible format or otherwise made available to interested parties.

Metadata Standards for Biological and Biomedical Data

Clinical Data Interchange Standards Consortium (CDISC)

This is a standard specifically designed for the healthcare industry. It's widely used for data related to clinical trials. More info at cdisc.org/standards

File Naming Applications