ISO 42001: 2023 - A.7.4 AI System Availability and Continuity
This article provides guidance on how to implement the ISO 42001: 2023 A.7.4 AI System Availability and Continuity
ISO 42001 Control Description
The organisation shall identify the availability requirements for AI systems in operational use, implement controls to meet those requirements, and establish continuity arrangements to maintain or restore AI system services in the event of disruption.
Control Objective
To ensure that AI systems deliver the levels of availability required by the business processes and stakeholders that depend upon them, and that the organisation has the capacity to maintain or recover AI system services in response to disruptions, including technical failures, data issues, and broader operational incidents.
Purpose
As AI systems become integral to organisational processes, their unavailability or degraded performance can have material consequences for operations, customers, and in some cases for individuals whose access to services or benefits depends on AI-mediated decisions. The availability requirements for an AI system — and the consequences of failing to meet them — are therefore important parameters that must be determined and managed as part of the system's operational governance.
AI systems may be subject to availability risks that differ from those affecting conventional IT systems. In addition to infrastructure failures, AI systems may experience functional unavailability arising from data pipeline failures, degraded model performance following distributional shift, or the need to suspend operation pending investigation of unexpected behaviours. Continuity planning for AI systems must account for this broader range of availability risk.
Availability and continuity measures must also be proportionate to the role the AI system plays in organisational and broader societal contexts. Systems that support critical decisions or services warrant more extensive and rigorously tested continuity arrangements than systems used for lower-stakes or easily substituted functions.
Guidance on Implementation
Establishing Availability Requirements
The organisation shall define availability requirements for each AI system based on the needs of the business processes and stakeholders it serves. Availability requirements shall address the acceptable levels of planned and unplanned downtime, the maximum acceptable recovery time following disruption, the acceptable degradation of performance during periods of reduced capacity, and any regulatory or contractual availability obligations.
Availability requirements shall be documented as part of the AI system's operational specifications and shall be agreed with relevant business stakeholders before the system enters operational use.
Availability Risk Assessment
The organisation shall assess the risks to AI system availability, identifying potential sources of disruption and their potential frequency and impact. Availability risks to be assessed include infrastructure failures affecting compute or storage resources; data pipeline failures or data quality degradation that prevent or compromise system operation; model failures or anomalous behaviour requiring suspension; third-party service failures affecting dependent components; and security incidents that affect system integrity or availability.
Availability risks shall be assessed in the context of the AI system's operational criticality and the consequences of unavailability for dependent processes and stakeholders.
Technical Availability Controls
The organisation shall implement technical controls appropriate to the availability requirements and risk assessment, which may include redundant infrastructure, automated failover mechanisms, performance monitoring and alerting, and capacity management processes. Where the AI system depends on external data sources or third-party services, the availability characteristics of those dependencies shall be understood and reflected in availability controls and continuity planning.
Technical availability controls shall be tested at defined intervals to confirm that they function as intended, and test results shall be documented.
Continuity Arrangements
The organisation shall establish continuity arrangements that define how AI system services will be maintained or restored in the event of disruption. Continuity arrangements shall address the prioritisation of recovery activities, the procedures for restoring system functionality, the conditions under which alternative processes shall be invoked to perform functions normally supported by the AI system, and the criteria for resuming normal AI system operation following recovery.
Where a disruption to an AI system would prevent the execution of business processes that affect individuals — such as the processing of service applications or the delivery of benefits — continuity arrangements shall include provisions for manual or alternative processing to ensure that individuals are not materially disadvantaged by the system's unavailability.
Testing and Maintenance of Continuity Arrangements
Continuity arrangements shall be tested at defined intervals to confirm their effectiveness. Test results shall be documented and reviewed, and any identified deficiencies shall be addressed through remediation activities. Continuity arrangements shall be reviewed and updated following significant changes to the AI system, its operational environment, or the business processes it supports.
Awareness and Training
Personnel with responsibilities for AI system continuity shall be appropriately trained and shall be aware of their roles and responsibilities in the event of a disruption. Continuity procedures shall be accessible to relevant personnel and shall be exercised sufficiently frequently that key personnel are familiar with them.
Related Controls
- A.7.2 – Establishing Processes, Functions and Tools for AI Operation: Availability and continuity requirements shall be reflected in the operational framework and the design of operational processes.
- A.7.5 – AI System Monitoring: Monitoring activities shall include tracking of availability metrics and alerting for conditions that may indicate an emerging availability issue.
- A.8.2 – AI System Incident Management: Availability incidents shall be managed in accordance with the AI incident management process, and continuity arrangements shall be integrated with incident response procedures.
- A.9.3 – AI System Supply Chain: The availability characteristics of third-party components and services shall be assessed and managed as part of supply chain governance.
- A.6.1.2 – AI Risk Assessment: Availability risks shall be assessed as part of the broader AI risk assessment, and the risk treatment decisions shall inform availability control and continuity investment decisions.