Two regulatory instruments now define the outer walls of enterprise AI governance for organizations operating in Europe or with European customers. ISO/IEC 42001, published in December 2023, is the first certifiable management system standard for AI — a PDCA-cycle framework that gives organizations a structured, auditable path to responsible AI management. The EU AI Act, which entered into force in August 2024, is binding law with graduated obligations based on risk classification. Together they create a compliance architecture that is more demanding than anything the enterprise software industry has previously encountered, and for agentic systems specifically, that architecture contains several provisions that require careful engineering, not just careful documentation.
ISO/IEC 42001: The PDCA Backbone
ISO/IEC 42001 is structured like other ISO management system standards — ISO 9001 for quality, ISO 27001 for information security — around the Plan-Do-Check-Act cycle. An organization that implements 42001 establishes an AI management system: a documented set of policies, processes, roles, and records that demonstrates it is managing AI risks in a systematic and continuously improving way. The standard is certifiable, meaning a third-party audit body can assess conformity and issue a certificate — a capability that is becoming increasingly important as enterprise customers and regulators begin requiring documented AI governance from their vendors and counterparties.
Annex A of the standard contains a library of controls organized around eight domains: AI system impact assessment, AI system life cycle, data management, technical controls, documented information, supply chain, competence and awareness, and stakeholder engagement. For agentic systems, the most operationally demanding controls sit in the technical domain: specifically, the requirements for explainability, robustness testing, and human oversight mechanisms. An agent that cannot provide a human-interpretable account of why it took a specific action at a specific moment will fail the explainability control; an agent that has not been tested against adversarial inputs will fail the robustness testing control; an agent deployed without a documented mechanism for human override will fail the human oversight control.
The standard also requires an AI impact assessment process — analogous to the privacy impact assessments required under GDPR — that evaluates the potential impacts of an AI system on individuals, organizations, and society before deployment and at regular intervals thereafter. For high-autonomy agents with the ability to affect significant resources or make consequential decisions, this assessment is a substantial undertaking that requires input from legal, ethics, and subject matter experts who understand the deployment context.
EU AI Act: Obligations for High-Risk Systems
The EU AI Act classifies AI systems into four risk tiers: unacceptable risk (prohibited), high risk (subject to conformity assessment and ongoing obligations), limited risk (transparency requirements), and minimal risk (no specific obligations). Most agentic systems deployed in consequential enterprise contexts — those affecting employment decisions, credit assessments, safety-critical operations, or public services — are likely to qualify as high-risk under Annex III of the Act. High-risk classification triggers a comprehensive set of obligations under Articles 8 through 15 that collectively define one of the most demanding technical compliance regimes in the history of software regulation.
Article 9 requires a risk management system that is continuously updated throughout the system's lifecycle — not a one-time risk assessment but an ongoing process that tracks identified risks, their likelihood and severity, and the measures taken to mitigate them. Article 10 requires that training, validation, and testing data meet defined quality standards — a requirement that applies not just to the foundation model but to any fine-tuning, retrieval corpus, or tool response data that the agent uses at runtime. Article 11 requires technical documentation that is comprehensive enough to allow a notified body to assess conformity — including documentation of the system's intended purpose, its performance characteristics, its limitations, and its foreseeable misuse scenarios.
Article 12 requires automatic logging of events throughout the system's operation — with specific requirements for the logging of operations that are "relevant for the identification of risks." For agents, this means logging not just the final output but the intermediate steps: the tool calls made, the data retrieved, the reasoning produced at each planning step. Article 13 requires transparency toward users that is sufficient for them to interpret the system's output correctly, including disclosure that they are interacting with an AI system. Article 14 requires human oversight mechanisms that are effective — meaning the oversight must be technically capable of preventing or minimizing risks, not merely a paper control. Article 15 requires that the system achieves an appropriate level of accuracy, robustness, and cybersecurity, with specific attention to resilience against attempts to alter outputs through adversarial manipulation.
"The EU AI Act's Article 14 requirement for effective human oversight is the one provision that most directly constrains the autonomy architecture of a deployed agent. 'Effective' means technically capable of intervention, not merely nominally available."
Conformity Assessment and the Audit Trail
For most high-risk AI systems, the EU AI Act requires a conformity assessment before the system can be placed on the market. For some categories — biometric identification, critical infrastructure, employment — this assessment must be conducted by a notified body, a third-party organization accredited by a national authority. For other high-risk categories, self-assessment is permitted, but the organization must produce a Declaration of Conformity and register the system in the EU AI Act's public database before deployment.
The audit trail that supports conformity assessment is substantially more demanding than the audit trail required for conventional software compliance. It must include: the technical documentation required under Article 11; records of the quality management system required under Article 17; records of the post-market monitoring system required under Article 72; and, for any serious incidents, the incident report required under Article 73. For agents specifically, the requirement to log "operations relevant for the identification of risks" under Article 12 effectively mandates complete execution traces — not summarized logs but the full sequence of model calls, tool invocations, and state transitions that constitute each agent run.
The FRIA Frontier
The EU AI Act introduces a new instrument that has no direct analog in prior EU technology regulation: the Fundamental Rights Impact Assessment (FRIA). Required for high-risk AI systems deployed by public bodies and certain private operators in high-stakes contexts, the FRIA extends the standard impact assessment framework to explicitly consider the impacts of the AI system on fundamental rights — the rights protected under the EU Charter of Fundamental Rights. For agents operating in contexts that affect access to essential services, employment, education, or justice, the FRIA is a significant undertaking.
Conducting a FRIA for an agentic system requires engaging with questions that are genuinely novel: how does the agent's behavior affect the rights of the people it interacts with, and how does that effect change when the agent is operating at the scale and speed that autonomous systems make possible? An agent that processes hundreds of loan applications per hour, making decisions that affect people's access to credit, is not simply a faster version of a human loan officer; it is a qualitatively different kind of decision-maker whose systematic biases can propagate at a speed and scale that makes them far harder to detect and correct. The FRIA is designed to surface exactly these systemic effects before deployment, rather than after they have caused harm at scale.