Skip to content
  • There are no suggestions because the search field is empty.

ISO 42001: 2023 - A.4.3 Data Resources

This article provides guidance on how to implement the ISO 42001:2023 control A.4.3 Data Resources

ISO 42001 Control Description

As part of resource identification, the organisation shall document information about the data resources utilised for the AI system.

Control Objective

To ensure that the organisation accounts for the resources (including AI system components and assets) of the AI system in order to fully understand and address risks and impacts.

Purpose

To enable understanding of data-related risks and impacts, support AI system impact assessments, and ensure legal and ethical compliance. Data documentation is critical for AI systems, particularly those using machine learning, where data quality directly affects system performance, fairness, and trustworthiness.

Guidance on Implementation

Documentation of data resources should include the following:

What to Document

a) Data provenance
  • Origin and sources of data (internal systems, external providers, public datasets)
  • Data holders, custodians, and users
  • Processes applied to data (cleaning, labeling, transformation)
  • Storage locations and history
b) Data categorisation (for machine learning systems)
  • Training data - used to train the model
  • Validation data - used to tune model parameters
  • Test data - used for final evaluation (must be kept separate)
  • Production data - real-world data the deployed system processes
c) Data quality (reference ISO/IEC 5259 series)
  • Accuracy, completeness, consistency, timeliness
  • Validity, uniqueness, integrity
  • Representativeness and appropriateness for intended use
  • Quality assessment results and acceptance criteria
d) Temporal information
  • Date data was last updated or modified
  • Time period covered by the dataset
  • Data currency relative to deployment context
e) Data labeling process (for supervised learning)
  • Who labeled the data and labeling methodology
  • Quality control measures and inter-annotator agreement
  • Error rates and consistency checks
f) Intended use and restrictions
  • Purpose for which data is collected and used
  • Contractual, legal, or ethical restrictions on use
  • Prohibited uses
g) Known or potential bias issues
  • Demographic, geographic, or temporal imbalances
  • Selection biases in data collection
  • Mitigation measures taken and residual risks
h) Data preparation methods
  • Cleaning, normalisation, feature engineering
  • Data augmentation or synthetic data generation
  • Preprocessing steps applied
i) Data retention and disposal
  • Retention periods and triggers
  • Secure deletion procedures
  • Archival strategies for compliance or reproducibility
j) Legal and ethical compliance
  • Legal basis for data processing (GDPR Article 6)
  • Consent mechanisms and data subject rights
  • Privacy protections (anonymisation, pseudonymisation)
  • Data sharing agreements and IP considerations
k) Metadata and documentation standards
  • Structured metadata schema (reference ISO/IEC 23751)
  • Datasheets or data cards describing datasets

Implementation Steps

Organisations should:

  1. Create a data inventory - Identify all datasets used across the AI system lifecycle (development, production, monitoring)
  2. Assign ownership - Designate data owners, custodians, and stewards for each dataset
  3. Document systematically - Use standardised templates to document each dataset covering the topics above
  4. Store centrally - Maintain documentation in accessible, version-controlled repositories (data catalogs, metadata management systems)
  5. Link to impact assessments - Use data documentation to inform AI system impact assessments (Clause 6.1.4)
  6. Maintain and update - Review and update documentation when data changes or at planned intervals
  7. Integrate with governance - Align with data governance policies, data protection impact assessments (DPIA), and data quality management processes

Key Considerations

Data quality trade-offs: Perfect data is rare. Document known limitations, mitigation approaches, and residual risks.

Separation of datasets: For machine learning, training, validation, and test data must be kept separate to ensure valid performance evaluation. Mixing these datasets leads to unreliable performance metrics.

Bias assessment: Failing to identify and document bias in training data can lead to discriminatory AI systems. Use demographic analysis and fairness metrics to assess representativeness.

Dynamic data: If using continuous learning (learning from production data), document how production data becomes training data, establish quality monitoring, and address privacy implications.

Data sensitivity: Personal data (PII) requires additional documentation for GDPR/CCPA compliance, including legal basis, consent records, and privacy protection measures.

Provenance tracking: Maintain complete history of data transformations to enable traceability when issues arise and support audit requirements.

If the organisation shares data resource documentation externally, ensure confidential information is not disclosed.

Related Controls

Within ISO/IEC 42001:

  • A.4.2 Resource documentation
  • A.6.1 Data acquisition
  • A.7.4 Quality of data for AI systems

Integration with ISO 27001 (if applicable):

  • A.5.9 Inventory of information and other associated assets
  • A.5.34 Privacy and protection of PII

Integration with GDPR:

  • Article 30: Records of processing activities
  • Article 35: Data Protection Impact Assessment (DPIA)