Governing What We Don’t Fully Control
We are living in an era of optimisation.
Workflows, trackers, automations, wearables. All promising more efficiency, accelerated by AI.
In minutes, I can deploy small agents to manage follow-ups, reminders, planning. Systems that quietly shape how my days unfold.
But if one person can build a working agent in minutes, what happens when organisations deploy hundreds across customer journeys, underwriting, claims, credit, operations? Often without fully understanding how they behave together.
It is like wanting something that feels right from afar, without yet knowing what it will ask of you. One thing to imagine in theory. Another to live in practice.
At that point, we move from experimentation to governance. From novelty to duty of care. From potential to operational risk.
Google’s recent whitepaper, Introduction to Agents, gives a name to what comes next:
Agent Operations, or AgentOps.
For industries such as finance, insurance, and healthcare, this will not be optional. It will become a core layer of operational risk. A familiar intent applied to systems that will not always behave in predictable ways.
AgentOps, in simple terms, is operational risk for autonomous behaviour.
The foundations remain recognisable:
- Evaluation, where quality is measured against criteria that matter to the business.
- Tracing, where reasoning paths are visible so decisions can be understood and challenged.
- Guardrails, where constraints are built into system behaviour, not stored in documents no model will read.
- Continuous feedback, where events strengthen the system instead of being dismissed as isolated anomalies.
Together, these allow autonomy to operate within boundaries that humans can defend.
Operational Risk, Rewritten
Traditional pass or fail checks work when systems are deterministic. Agents are not.
Quality becomes multidimensional: accuracy, reasoning, data reliability, tone, fairness, policy alignment. A system can be partially right and still lead to the wrong outcome.
Operational risk has already evolved in a similar direction. It moved beyond counting incidents and toward understanding resilience. AgentOps follows the same trajectory.
This evolution introduces two shifts that risk teams cannot ignore.
First, agents blur the line between tool and colleague. They draft documents, recommend actions, and soon may trigger workflows or approve transactions. Policies built for software may not apply.
Second, many operational losses will not come from a single agent, but from the interactions between agents and legacy systems. Losses may emerge not from one failure, but from the space in between.
This is where observability becomes essential.
If you can see how a system reached a conclusion, you can understand it.
If you can understand it, you can investigate.
If you can investigate, you can improve.
Governance becomes visibility rather than static approval.
And human oversight does not disappear. It becomes a supervisory layer.
Feedback must be structured, consistent, and connected to performance and risk indicators. Anyone who has worked in a second line function will recognise the mindset. It is not new thinking. It is a new application of it.
Takeaway
Agents will not stay at the edges of work. They will shape decisions, reports, claims, credit and operations. Each interaction will carry risk, accountability and consequence. The strength of AI in organisations will not depend only on the model, but on whether we can operate autonomy with clarity, judgment and oversight.
For operational risk teams, the shift is straightforward: do not try to eliminate variability. Understand it. Instrument it. Supervise it. Autonomy without oversight is fragility. Autonomy with observability becomes a capability.