
SAKET BIVALKAR
Saket’s focus is on helping organisations to become flexible and adaptive, while emphasising that people in the organisation grow as well. His experience includes working with a range of organisations from large, complex global enterprises to small entrepreneurial start-ups.
The Discipline Your AI Transformation Is Missing
When AI can act — not just assist — your operating model must change. Task decomposition, tool permissions, audit trails, and accountability are not controls bolted on after the fact. They are the architecture of safe, scalable agentic work.
The moment an AI agent can trigger a workflow, write into a system of record, send an email, update customer data, approve an exception, or initiate a transaction — you have stopped deploying a productivity tool and started introducing a new type of digital labor into your organization.
That shift demands a discipline most transformation programs have not built. Not more prompting. Not another governance framework left on a SharePoint. Agent management: the operating discipline that makes AI agents safe, effective, and scalable.
“Assistive AI sits at the edge of work. Agentic AI sits inside it — planning, deciding, and executing through tools. That changes what good management means.”
The tipping point: from assistive to acting
Assistive AI is legible and low-stakes. It suggests. It drafts. It summarizes. The human remains the execution engine; the model is a smart co-pilot.
Agentic AI is different in kind. When an agent has tool access — a CRM API, a payment trigger, an identity system — autonomy and consequence arrive together. And the operating questions become immediate:
→ Who owns outcomes when an agent acts?
→ What is the agent allowed to do without approval?
→ What evidence trail exists if something goes wrong?
→ How do you contain impact when an agent makes a bad call?
These are not theoretical. OWASP’s work on agentic application risks identifies tool misuse and identity and privilege abuse as core failure modes — not future concerns, but patterns already showing up in production deployments where agents have autonomy and broad tool access.
If your transformation program does not explicitly address these questions, you are not scaling AI. You are scaling uncertainty.
What is agent management?
Agent management is the combination of four operating disciplines: designing work so agents can act within boundaries; limiting what they can touch; tracing everything they do; and assigning clear human ownership over their outcomes.
These are not controls layered on top of an agentic deployment. They are org design decisions embedded into how work runs. Skip them and you will discover them anyway — during an incident, a compliance review, or an executive conversation that asks: “What exactly did the agent do, and who approved that?”
DISCIPLINE 1
Task decomposition: make work legible for agents
Agents fail most reliably when given vague goals inside messy workflows. The fix is not a better prompt. It is redesigning the workflow before the agent touches it.
Decompose every workflow into three categories:
THREE TASK CATEGORIES
| CATEGORY | DEFINITION | AGENT DEFAULT |
|---|---|---|
| Deterministic | Repeatable steps with clear rules and predictable inputs | Autonomous — low oversight required |
| Judgment | Ambiguity, interpretation, exceptions, trade-offs | Human decision required; agent may draft |
| Irreversible | Payments, contracts, access grants, production changes | Human approval mandatory; dual control where stakes are high |
This decomposition gives you immediate clarity on where bounded autonomy is safe and where a human must remain the decision-maker. The goal is not to restrict agents — it is to design their operating space intentionally.
DEFINE STOP CONDITIONS BEFORE YOU DEPLOY
For every agentic task, define these four parameters before launch:
OUTPUTS What constitutes successful completion?
STOP CONDITIONS When must the agent pause and escalate to a human?
ERROR BUDGET What tolerance exists, and what triggers rollback?
Without explicit stop conditions, “autonomy” is indistinguishable from “guessing.” The template makes bounded autonomy operable, not just theoretical.
DISCIPLINE 2
Tool permissions: agents need access badges
Agents derive real power from tools: CRM updates, ticketing systems, document repositories, internal APIs, financial systems, identity stores. If you give an agent broad credentials to “make it work,” you have created a single point of failure with velocity and scale.
Identity and privilege abuse is explicitly called out as a top agentic risk category — and the consistent mitigation is least privilege and tightly scoped capabilities. Not as an afterthought. As the default starting point.
A THREE-TIER PERMISSIONS MODEL
| TIER | ACTIONS | DEFAULT POLICY |
|---|---|---|
| Tier 1 — Safe | Read-only, drafting, recommendations | Fully autonomous |
| Tier 2 — Controlled | Write actions with constraints: field updates, tickets, templated sends | Guardrails + sampling review |
| Tier 3 — High Impact | Money movement, contracts, access grants, sensitive data, production changes | Explicit approval or dual control required |
This turns “human-in-the-loop” from a slogan into a designed mechanism. Humans are engaged at the right tier — not as rubber-stamps on everything, and not absent from things that matter.
DISCIPLINE 3
Audit trails and observability: evidence is the product
In agentic work, auditability is not optional. It is how you preserve trust, accelerate debugging, satisfy regulators, and have defensible conversations when something goes wrong.
At minimum, you need traceability across four dimensions:
WHAT IT DECIDED Rationale, classification, confidence signals
WHAT IT DID Tool calls, parameters, side effects
WHAT CHANGED Records updated, messages sent, transactions triggered
DESIGN FOR CONTAINMENT, NOT JUST LOGGING
Logging is table stakes. The harder — and more important — question is blast radius: if the agent makes a bad call at scale, can you limit damage?
→ Can you pause the agent immediately without manual intervention?
→ Can you roll back its actions before they propagate downstream?
→ Can you detect abnormal tool sequences, not just abnormal outputs?
→ Can you trace the full tool chain — not only the model’s final response?
OWASP’s agentic risk work points directly at tool misuse patterns: legitimate tools used in unsafe combinations. Your observability must cover the chain of tool calls, not only what the model said.
DISCIPLINE 4
Accountability and escalation: one owner per agent
Every agent needs a named owner accountable for outcomes — not just uptime. Without ownership, incidents become blame loops: the vendor, the model, the prompt, the integration. The answer is never found because no one was asked to own the question.
A clean operating model assigns four distinct roles:
AGENT OWNER Owns agent behavior, boundaries, release cadence
TOOL OWNER Owns each integration, permissions, schemas, failure modes
RISK PARTNER Validates evidence, monitoring thresholds, and escalation paths
ISO/IEC 42001 provides a useful backbone for formalizing this as a management system — with continual improvement, defined accountability, and structured processes. Not bureaucracy: operational clarity at scale.
The Hybrid (Human + AI) operating model blueprint: how to implement agent management
Here is a pragmatic blueprint you can apply without turning it into a bureaucratic program.
Step 1: Build an “Agent Register” for production, not for slideware
For each agent, capture:
-
Purpose and scope
-
Allowed actions (Tier 1 to Tier 3)
-
Tools and permissions
-
Stop conditions and escalation paths
-
Owner, approvers, release cadence
-
Evidence requirements (what must be captured for audits and investigations)
This is the operating model equivalent of an org chart. If you cannot list your agents, you cannot manage them.
Step 2: Define decision rights at the edges
Decide explicitly:
-
What the agent can do autonomously
-
What requires human approval
-
What requires dual control
-
What is prohibited
Then design the workflow around those decision rights so humans are not asked to rubber-stamp.
Step 3: Implement telemetry that proves value and control
Track dual metrics:
Value metrics
-
Cycle time reduction
-
Cost-to-serve
-
Throughput per team
-
Conversion, retention, NPS (where relevant)
Control metrics
-
Escalation rate
-
Override rate
-
Exception backlog
-
Incident frequency and severity
-
Drift indicators (performance changes over time)
-
Tool-call anomalies
If your telemetry is only “usage”, your transformation will fail the executive test: “Show me business impact and control.”
Step 4: Create a lightweight change cadence
Agents change faster than classic enterprise software. Your operating cadence should include:
-
Pre-release scenario tests (happy path and edge cases)
-
Post-release monitoring and sampling review
-
Incident review, with learning captured as design changes
-
Rollback procedures that work under pressure
This is how you scale without fear.
Why this matters in Europe: governance expectations are rising
The EU AI Act is fully applicable from 2 August 2026, with earlier obligations already applicable, including prohibited practices and AI literacy from 2 February 2025, and governance rules plus obligations for general-purpose AI from 2 August 2025.
The practical implication is simple: if your agents can act, you should be ready to show governance, accountability, and evidence.
Agent management is how you get there without slowing down.
The operating model blueprint
Here is a pragmatic implementation path you can apply without turning it into a multi-quarter governance program.
1 BUILD AN AGENT REGISTER — FOR OPERATIONS, NOT SLIDEWARE
2 DEFINE DECISION RIGHTS AT EVERY EDGE
3 IMPLEMENT TELEMETRY THAT PROVES BOTH VALUE AND CONTROL
VALUE METRICS
- Cycle time reduction
- Cost-to-serve
- Throughput per team
- Conversion / retention / NPS
CONTROL METRICS
- Escalation rate
- Override rate
- Exception backlog
- Incident frequency & drift
4 CREATE A LIGHTWEIGHT CHANGE CADENCE
REGULATORY CONTEXT — EU AI ACT
The EU AI Act is fully applicable from 2 August 2026, with governance rules and obligations for general-purpose AI in effect from 2 August 2025. Prohibited practices and AI literacy obligations have applied since 2 February 2025. If your agents can act, you need to demonstrate governance, accountability, and documented evidence. Agent management is how you meet that bar without slowing down.
WORKING EXAMPLE
The Refund Agent
An AI agent handles refunds end-to-end: reads the request, verifies purchase data, applies policy, triggers the refund via payment API, updates the CRM, notifies the customer.
If you treat this as a productivity win, you miss the operating model redesign it requires. Run it properly and every discipline applies:
- Task decomposition: triage, decision, execution are separate stages
- Permission tiers: auto-refunds up to €X; above that, approval required
- Audit trail: policy version, evidence used, tool calls, record changes
- Accountability: one owner responsible for outcomes and error budget
- Escalation: fraud signals, ambiguity, and exceptions route to humans
- Containment: agent can be paused and actions rolled back immediately
Multiply this across HR, finance ops, procurement, and sales ops. That is why agentic AI changes org design — not just productivity.
Simulate before you scale
Most organizations discover operating model failures after rollout: exceptions explode, approvals become rubber-stamps, agents get over-permissioned to keep velocity, evidence trails are incomplete.
A DTO preview (Digital Twin of the Organisation) lets you stress-test human-agent handoffs, exception paths, decision rights, and blast radius — before a single agent touches production. The best starting point is one workflow with real operational load: customer support, finance ops, HR, or sales ops.
An AI Registry Preview : If you cannot list your agents, you cannot manage them. The AI Registry gives you a live inventory of every agent in production — purpose, permissions, owners, audit status, and risk tier — in one place.
FAQs
What is agent management?
Agent management is the operating discipline for designing, governing, and running AI agents that can act autonomously. It covers four areas: task decomposition (defining where agents can act and where they must stop), tool permissions (least-privilege access tiered by impact), audit trails (full traceability of inputs, decisions, tool calls, and outcomes), and accountability (named ownership and escalation paths for every agent in production).
Why do AI agents require an operating model change, not just governance controls?
Because once agents act, the questions that arise — who owns outcomes, what evidence exists, what can the agent touch — are org design questions, not just compliance ones. Decision rights, escalation paths, and ownership structures must be designed into how work runs, not appended as a controls layer afterward.
What are the biggest risks with agentic AI?
Tool misuse and identity or privilege abuse are the most consistently cited production risks — particularly when agents have broad tool access combined with a high degree of autonomy. Over-permissioning is the most common root cause: it is far easier to grant broad credentials than to design tightly scoped access, especially when teams are under pressure to ship.
What is a "stop condition" for an AI agent?
A stop condition is a pre-defined trigger that causes an agent to pause and escalate to a human rather than continuing to act. Examples include: confidence falling below a threshold, an action classified as Tier 3 (high impact), a fraud signal in an incoming request, or an output that does not match expected completion criteria. Stop conditions convert “bounded autonomy” from a concept into a designed mechanism.
What is a Digital Twin of the Organisation (DTO) and how does it apply to agentic AI?
A DTO is a simulation model of an organization’s workflows, decision points, and human-agent interactions. Applied to agentic AI, it lets teams test human-agent handoffs, exception paths, escalation rules, and blast radius scenarios before agents are deployed in production — reducing the risk of discovering failure modes at scale.
The Unseen Flaws in Organisations
Uncover unseen threats in operating models and leadership using lessons from Abraham Wald to build resilient, adaptive organizations.
Cultural misalignment, leadership gaps, and behavioural disconnects.
A practical guide to assessing organisational culture during due diligence—tools, templates, and insights to reduce post-deal risk.
Why Operating Models Fail to Deliver Strategy
Operating models fail to deliver when misaligned, rigid or under‑enabled. Versatile Consulting shows how to build resilient, outcome‑driven models.