ISO 42001: 2023 - A.7.5 AI System Monitoring

ISO 42001 Control Description

The organisation shall establish and operate a monitoring programme for AI systems in operational use, covering system performance, output quality, operational behaviour, and alignment with intended use, to enable timely detection of issues and support ongoing accountability and governance.

Control Objective

To ensure that AI systems in operational use are subject to ongoing observation and evaluation, enabling the organisation to detect performance degradation, behavioural anomalies, emerging risks, and non-conformances with requirements in a timely manner, and to respond to these issues effectively before material harm occurs.

Purpose

Unlike conventional software systems, whose behaviour is typically deterministic and can be fully verified through testing, AI systems may behave differently when exposed to real-world operational data than they did in controlled testing environments. Model performance can degrade over time as the statistical properties of operational data diverge from those of the training data — a phenomenon commonly referred to as model drift. The distribution of inputs encountered in operation may shift in ways that were not anticipated during development, and user interactions may surface failure modes that were not identified during verification and validation.

These characteristics mean that the deployment of an AI system is not the end of the risk management process but the beginning of a continuous obligation to observe, evaluate, and respond to system behaviour. Monitoring is the primary mechanism through which this obligation is discharged. A well-designed monitoring programme provides the organisation with the operational visibility required to act as a responsible steward of its AI systems, protecting both the interests of the organisation and the interests of individuals whose lives and decisions are influenced by system outputs.

Guidance on Implementation

Defining Monitoring Objectives and Scope

The organisation shall define clear objectives for the monitoring of each AI system, informed by the system's intended use, risk profile, and the requirements established during the development lifecycle. Monitoring objectives shall address performance against defined metrics, the ongoing validity of the system's outputs in the operational context, alignment with intended use boundaries, and indicators of emerging risks or adverse impacts.

The scope of monitoring shall be commensurate with the risk profile of the system: higher-risk systems warrant more comprehensive, more frequent, and more rigorously reviewed monitoring.

Performance Monitoring

The organisation shall monitor AI system performance against the performance criteria defined in the requirements specification, using metrics appropriate to the system's function and the nature of its outputs. Performance monitoring shall be conducted at defined intervals or continuously where the system's operational role makes this necessary.

Where ground truth labels are available for operational data, the organisation shall use these to evaluate actual system accuracy in operation. Where ground truth is unavailable or delayed, proxy indicators of performance shall be defined and monitored. Monitoring results shall be compared against defined thresholds, and deviations shall trigger defined review or escalation actions.

Detecting Distributional Shift and Model Drift

The organisation shall implement mechanisms to detect distributional shift in the inputs presented to the AI system in operation, as changes in input distributions may indicate that the system is operating outside the conditions for which it was developed and validated. Monitoring shall also track indicators of model drift over time, including changes in output distributions that are not explained by corresponding changes in inputs.

Where shift or drift is detected, a structured assessment shall be conducted to determine whether system performance remains within acceptable bounds and whether remediation — such as retraining, recalibration, or operational restrictions — is required.

Monitoring for Adverse Outcomes and Fairness

The organisation shall implement monitoring to detect adverse outcomes associated with AI system outputs, including outcomes that may indicate the system is causing harm to individuals or producing discriminatory results. Where the system informs decisions that affect individuals, monitoring shall include tracking of outcome distributions across relevant population groups to enable the detection of differential impact.

Reports of adverse outcomes, whether received through formal feedback mechanisms or identified through operational monitoring, shall be assessed and acted upon in a timely manner.

Monitoring Governance and Review

Monitoring results shall be reviewed at defined intervals by personnel with appropriate responsibility and technical competence. Review processes shall consider whether monitoring findings indicate issues requiring remediation, whether monitoring metrics and thresholds remain appropriate, and whether the scope of monitoring continues to address the most significant risks associated with the system.

Monitoring records, including data collected, analysis conducted, and actions taken, shall be retained in accordance with the organisation's document management policy.

Escalation and Response

The monitoring framework shall define escalation pathways for monitoring findings that indicate performance below acceptable thresholds, unexpected behavioural anomalies, or indicators of harm. Escalation criteria, response procedures, and the responsibilities of personnel at each stage of the escalation process shall be clearly documented and communicated to relevant parties.

In cases where monitoring findings indicate that the AI system poses unacceptable risk if continued in operation, the organisation shall have the capability to suspend or restrict system use promptly pending investigation and remediation.

Related Controls

A.6.2.6 – AI System Verification and Validation: Monitoring metrics and thresholds shall be informed by the performance criteria and acceptance thresholds established during verification and validation.
A.6.2.7 – AI System Deployment: Monitoring shall commence from the point of deployment, with particular attention to system behaviour during the initial operational period.
A.8.2 – AI System Incident Management: Monitoring findings that indicate significant issues shall be escalated and managed through the AI incident management process.
A.7.6 – AI System Change Management: Monitoring findings may trigger change management activities, including retraining, recalibration, or more substantive system modifications.
A.6.1.2 – AI Risk Assessment: Monitoring results shall feed back into periodic reviews of the AI risk assessment, ensuring that risk assessments reflect current operational experience.