ISO 42001: 2023 - A.4.3 Data Resources
This article provides guidance on how to implement the ISO 42001:2023 control A.4.3 Data Resources
ISO 42001 Control Description
As part of resource identification, the organisation shall document information about the data resources utilised for the AI system.
Control Objective
To ensure that the organisation accounts for the resources (including AI system components and assets) of the AI system in order to fully understand and address risks and impacts.
Purpose
To enable understanding of data-related risks and impacts, support AI system impact assessments, and ensure legal and ethical compliance. Data documentation is critical for AI systems, particularly those using machine learning, where data quality directly affects system performance, fairness, and trustworthiness.
Guidance on Implementation
Documentation of data resources should include the following:
What to Document
a) Data provenance- Origin and sources of data (internal systems, external providers, public datasets)
- Data holders, custodians, and users
- Processes applied to data (cleaning, labeling, transformation)
- Storage locations and history
- Training data - used to train the model
- Validation data - used to tune model parameters
- Test data - used for final evaluation (must be kept separate)
- Production data - real-world data the deployed system processes
- Accuracy, completeness, consistency, timeliness
- Validity, uniqueness, integrity
- Representativeness and appropriateness for intended use
- Quality assessment results and acceptance criteria
- Date data was last updated or modified
- Time period covered by the dataset
- Data currency relative to deployment context
- Who labeled the data and labeling methodology
- Quality control measures and inter-annotator agreement
- Error rates and consistency checks
- Purpose for which data is collected and used
- Contractual, legal, or ethical restrictions on use
- Prohibited uses
- Demographic, geographic, or temporal imbalances
- Selection biases in data collection
- Mitigation measures taken and residual risks
- Cleaning, normalisation, feature engineering
- Data augmentation or synthetic data generation
- Preprocessing steps applied
- Retention periods and triggers
- Secure deletion procedures
- Archival strategies for compliance or reproducibility
- Legal basis for data processing (GDPR Article 6)
- Consent mechanisms and data subject rights
- Privacy protections (anonymisation, pseudonymisation)
- Data sharing agreements and IP considerations
- Structured metadata schema (reference ISO/IEC 23751)
- Datasheets or data cards describing datasets
Implementation Steps
Organisations should:
- Create a data inventory - Identify all datasets used across the AI system lifecycle (development, production, monitoring)
- Assign ownership - Designate data owners, custodians, and stewards for each dataset
- Document systematically - Use standardised templates to document each dataset covering the topics above
- Store centrally - Maintain documentation in accessible, version-controlled repositories (data catalogs, metadata management systems)
- Link to impact assessments - Use data documentation to inform AI system impact assessments (Clause 6.1.4)
- Maintain and update - Review and update documentation when data changes or at planned intervals
- Integrate with governance - Align with data governance policies, data protection impact assessments (DPIA), and data quality management processes
Key Considerations
Data quality trade-offs: Perfect data is rare. Document known limitations, mitigation approaches, and residual risks.
Separation of datasets: For machine learning, training, validation, and test data must be kept separate to ensure valid performance evaluation. Mixing these datasets leads to unreliable performance metrics.
Bias assessment: Failing to identify and document bias in training data can lead to discriminatory AI systems. Use demographic analysis and fairness metrics to assess representativeness.
Dynamic data: If using continuous learning (learning from production data), document how production data becomes training data, establish quality monitoring, and address privacy implications.
Data sensitivity: Personal data (PII) requires additional documentation for GDPR/CCPA compliance, including legal basis, consent records, and privacy protection measures.
Provenance tracking: Maintain complete history of data transformations to enable traceability when issues arise and support audit requirements.
If the organisation shares data resource documentation externally, ensure confidential information is not disclosed.
Related Controls
Within ISO/IEC 42001:
- A.4.2 Resource documentation
- A.6.1 Data acquisition
- A.7.4 Quality of data for AI systems
Integration with ISO 27001 (if applicable):
- A.5.9 Inventory of information and other associated assets
- A.5.34 Privacy and protection of PII
Integration with GDPR:
- Article 30: Records of processing activities
- Article 35: Data Protection Impact Assessment (DPIA)