ISO 42001: 2023 - A.4.3 Data Resources

ISO 42001 Control Description

As part of resource identification, the organisation shall document information about the data resources utilised for the AI system.

Control Objective

To ensure that the organisation accounts for the resources (including AI system components and assets) of the AI system in order to fully understand and address risks and impacts.

Purpose

To enable understanding of data-related risks and impacts, support AI system impact assessments, and ensure legal and ethical compliance. Data documentation is critical for AI systems, particularly those using machine learning, where data quality directly affects system performance, fairness, and trustworthiness.

Guidance on Implementation

Documentation of data resources should include the following:

What to Document

a) Data provenance

Origin and sources of data (internal systems, external providers, public datasets)
Data holders, custodians, and users
Processes applied to data (cleaning, labeling, transformation)
Storage locations and history

b) Data categorisation (for machine learning systems)

Training data - used to train the model
Validation data - used to tune model parameters
Test data - used for final evaluation (must be kept separate)
Production data - real-world data the deployed system processes

c) Data quality (reference ISO/IEC 5259 series)

Accuracy, completeness, consistency, timeliness
Validity, uniqueness, integrity
Representativeness and appropriateness for intended use
Quality assessment results and acceptance criteria

d) Temporal information

Date data was last updated or modified
Time period covered by the dataset
Data currency relative to deployment context

e) Data labeling process (for supervised learning)

Who labeled the data and labeling methodology
Quality control measures and inter-annotator agreement
Error rates and consistency checks

f) Intended use and restrictions

Purpose for which data is collected and used
Contractual, legal, or ethical restrictions on use
Prohibited uses

g) Known or potential bias issues

Demographic, geographic, or temporal imbalances
Selection biases in data collection
Mitigation measures taken and residual risks

h) Data preparation methods

Cleaning, normalisation, feature engineering
Data augmentation or synthetic data generation
Preprocessing steps applied

i) Data retention and disposal

Retention periods and triggers
Secure deletion procedures
Archival strategies for compliance or reproducibility

j) Legal and ethical compliance

Legal basis for data processing (GDPR Article 6)
Consent mechanisms and data subject rights
Privacy protections (anonymisation, pseudonymisation)
Data sharing agreements and IP considerations

k) Metadata and documentation standards

Structured metadata schema (reference ISO/IEC 23751)
Datasheets or data cards describing datasets

Implementation Steps

Organisations should:

Create a data inventory - Identify all datasets used across the AI system lifecycle (development, production, monitoring)
Assign ownership - Designate data owners, custodians, and stewards for each dataset
Document systematically - Use standardised templates to document each dataset covering the topics above
Store centrally - Maintain documentation in accessible, version-controlled repositories (data catalogs, metadata management systems)
Link to impact assessments - Use data documentation to inform AI system impact assessments (Clause 6.1.4)
Maintain and update - Review and update documentation when data changes or at planned intervals
Integrate with governance - Align with data governance policies, data protection impact assessments (DPIA), and data quality management processes

Key Considerations

Data quality trade-offs: Perfect data is rare. Document known limitations, mitigation approaches, and residual risks.

Separation of datasets: For machine learning, training, validation, and test data must be kept separate to ensure valid performance evaluation. Mixing these datasets leads to unreliable performance metrics.

Bias assessment: Failing to identify and document bias in training data can lead to discriminatory AI systems. Use demographic analysis and fairness metrics to assess representativeness.

Dynamic data: If using continuous learning (learning from production data), document how production data becomes training data, establish quality monitoring, and address privacy implications.

Data sensitivity: Personal data (PII) requires additional documentation for GDPR/CCPA compliance, including legal basis, consent records, and privacy protection measures.

Provenance tracking: Maintain complete history of data transformations to enable traceability when issues arise and support audit requirements.

If the organisation shares data resource documentation externally, ensure confidential information is not disclosed.

Related Controls

Within ISO/IEC 42001:

A.4.2 Resource documentation
A.6.1 Data acquisition
A.7.4 Quality of data for AI systems

Integration with ISO 27001 (if applicable):

A.5.9 Inventory of information and other associated assets
A.5.34 Privacy and protection of PII

Integration with GDPR:

Article 30: Records of processing activities
Article 35: Data Protection Impact Assessment (DPIA)