SAKET BIVALKAR

Saket’s focus is on helping organisations to become flexible and adaptive, while emphasising that people in the organisation grow as well. His experience includes working with a range of organisations from large, complex global enterprises to small entrepreneurial start-ups.

The Discipline Your AI Transformation Is Missing

When AI can act — not just assist — your operating model must change. Task decomposition, tool permissions, audit trails, and accountability are not controls bolted on after the fact. They are the architecture of safe, scalable agentic work.

The moment an AI agent can trigger a workflow, write into a system of record, send an email, update customer data, approve an exception, or initiate a transaction — you have stopped deploying a productivity tool and started introducing a new type of digital labor into your organization.

That shift demands a discipline most transformation programs have not built. Not more prompting. Not another governance framework left on a SharePoint. Agent management: the operating discipline that makes AI agents safe, effective, and scalable.

                                                                                                                                                          

“Assistive AI sits at the edge of work. Agentic AI sits inside it — planning, deciding, and executing through tools. That changes what good management means.”

                                                                                                                                                             

The tipping point: from assistive to acting

 

Assistive AI is legible and low-stakes. It suggests. It drafts. It summarizes. The human remains the execution engine; the model is a smart co-pilot.

Agentic AI is different in kind. When an agent has tool access — a CRM API, a payment trigger, an identity system — autonomy and consequence arrive together. And the operating questions become immediate:

→ Who owns outcomes when an agent acts?

→ What is the agent allowed to do without approval?

→ What evidence trail exists if something goes wrong?

→ How do you contain impact when an agent makes a bad call?

These are not theoretical. OWASP’s work on agentic application risks identifies tool misuse and identity and privilege abuse as core failure modes — not future concerns, but patterns already showing up in production deployments where agents have autonomy and broad tool access.

If your transformation program does not explicitly address these questions, you are not scaling AI. You are scaling uncertainty.


What is agent management?

Agent management is the combination of four operating disciplines: designing work so agents can act within boundaries; limiting what they can touch; tracing everything they do; and assigning clear human ownership over their outcomes.

These are not controls layered on top of an agentic deployment. They are org design decisions embedded into how work runs. Skip them and you will discover them anyway — during an incident, a compliance review, or an executive conversation that asks: “What exactly did the agent do, and who approved that?”


DISCIPLINE 1

Task decomposition: make work legible for agents

Agents fail most reliably when given vague goals inside messy workflows. The fix is not a better prompt. It is redesigning the workflow before the agent touches it.

Decompose every workflow into three categories:

THREE TASK CATEGORIES

CATEGORY DEFINITION AGENT DEFAULT
Deterministic Repeatable steps with clear rules and predictable inputs Autonomous — low oversight required
Judgment Ambiguity, interpretation, exceptions, trade-offs Human decision required; agent may draft
Irreversible Payments, contracts, access grants, production changes Human approval mandatory; dual control where stakes are high

This decomposition gives you immediate clarity on where bounded autonomy is safe and where a human must remain the decision-maker. The goal is not to restrict agents — it is to design their operating space intentionally.

DEFINE STOP CONDITIONS BEFORE YOU DEPLOY

For every agentic task, define these four parameters before launch:

AGENT TASK TEMPLATEINPUTS What evidence must exist before the agent acts?
OUTPUTS What constitutes successful completion?
STOP CONDITIONS When must the agent pause and escalate to a human?
ERROR BUDGET What tolerance exists, and what triggers rollback?

Without explicit stop conditions, “autonomy” is indistinguishable from “guessing.” The template makes bounded autonomy operable, not just theoretical.

DISCIPLINE 2

Tool permissions: agents need access badges

Agents derive real power from tools: CRM updates, ticketing systems, document repositories, internal APIs, financial systems, identity stores. If you give an agent broad credentials to “make it work,” you have created a single point of failure with velocity and scale.

Identity and privilege abuse is explicitly called out as a top agentic risk category — and the consistent mitigation is least privilege and tightly scoped capabilities. Not as an afterthought. As the default starting point.

A THREE-TIER PERMISSIONS MODEL

TIER ACTIONS DEFAULT POLICY
Tier 1 — Safe Read-only, drafting, recommendations Fully autonomous
Tier 2 — Controlled Write actions with constraints: field updates, tickets, templated sends Guardrails + sampling review
Tier 3 — High Impact Money movement, contracts, access grants, sensitive data, production changes Explicit approval or dual control required

This turns “human-in-the-loop” from a slogan into a designed mechanism. Humans are engaged at the right tier — not as rubber-stamps on everything, and not absent from things that matter.

DISCIPLINE 3

Audit trails and observability: evidence is the product

In agentic work, auditability is not optional. It is how you preserve trust, accelerate debugging, satisfy regulators, and have defensible conversations when something goes wrong.

At minimum, you need traceability across four dimensions:

MINIMUM AUDIT REQUIREMENTSWHAT IT SAW Inputs, context, retrieved documents
WHAT IT DECIDED Rationale, classification, confidence signals
WHAT IT DID Tool calls, parameters, side effects
WHAT CHANGED Records updated, messages sent, transactions triggered

DESIGN FOR CONTAINMENT, NOT JUST LOGGING

Logging is table stakes. The harder — and more important — question is blast radius: if the agent makes a bad call at scale, can you limit damage?

→ Can you pause the agent immediately without manual intervention?

→ Can you roll back its actions before they propagate downstream?

→ Can you detect abnormal tool sequences, not just abnormal outputs?

→ Can you trace the full tool chain — not only the model’s final response?

OWASP’s agentic risk work points directly at tool misuse patterns: legitimate tools used in unsafe combinations. Your observability must cover the chain of tool calls, not only what the model said.

 

DISCIPLINE 4

Accountability and escalation: one owner per agent

Every agent needs a named owner accountable for outcomes — not just uptime. Without ownership, incidents become blame loops: the vendor, the model, the prompt, the integration. The answer is never found because no one was asked to own the question.

A clean operating model assigns four distinct roles:

AGENT OWNERSHIP MODELPROCESS OWNER Owns business outcome and workflow performance
AGENT OWNER Owns agent behavior, boundaries, release cadence
TOOL OWNER Owns each integration, permissions, schemas, failure modes
RISK PARTNER Validates evidence, monitoring thresholds, and escalation paths

ISO/IEC 42001 provides a useful backbone for formalizing this as a management system — with continual improvement, defined accountability, and structured processes. Not bureaucracy: operational clarity at scale.

The Hybrid (Human + AI) operating model blueprint: how to implement agent management

Here is a pragmatic blueprint you can apply without turning it into a bureaucratic program.

Step 1: Build an “Agent Register” for production, not for slideware

For each agent, capture:

  • Purpose and scope

  • Allowed actions (Tier 1 to Tier 3)

  • Tools and permissions

  • Stop conditions and escalation paths

  • Owner, approvers, release cadence

  • Evidence requirements (what must be captured for audits and investigations)

This is the operating model equivalent of an org chart. If you cannot list your agents, you cannot manage them.

Step 2: Define decision rights at the edges

Decide explicitly:

  • What the agent can do autonomously

  • What requires human approval

  • What requires dual control

  • What is prohibited

Then design the workflow around those decision rights so humans are not asked to rubber-stamp.

Step 3: Implement telemetry that proves value and control

Track dual metrics:

Value metrics

  • Cycle time reduction

  • Cost-to-serve

  • Throughput per team

  • Conversion, retention, NPS (where relevant)

Control metrics

  • Escalation rate

  • Override rate

  • Exception backlog

  • Incident frequency and severity

  • Drift indicators (performance changes over time)

  • Tool-call anomalies

If your telemetry is only “usage”, your transformation will fail the executive test: “Show me business impact and control.”

Step 4: Create a lightweight change cadence

Agents change faster than classic enterprise software. Your operating cadence should include:

  • Pre-release scenario tests (happy path and edge cases)

  • Post-release monitoring and sampling review

  • Incident review, with learning captured as design changes

  • Rollback procedures that work under pressure

This is how you scale without fear.

Why this matters in Europe: governance expectations are rising

The EU AI Act is fully applicable from 2 August 2026, with earlier obligations already applicable, including prohibited practices and AI literacy from 2 February 2025, and governance rules plus obligations for general-purpose AI from 2 August 2025.

The practical implication is simple: if your agents can act, you should be ready to show governance, accountability, and evidence.

Agent management is how you get there without slowing down.

The operating model blueprint

Here is a pragmatic implementation path you can apply without turning it into a multi-quarter governance program.

 

1 BUILD AN AGENT REGISTER — FOR OPERATIONS, NOT SLIDEWARE

For every agent in production, capture: purpose and scope, allowed actions by tier, tools and permissions, stop conditions and escalation paths, named owners, release cadence, and evidence requirements. If you cannot list your agents, you cannot manage them. This is the operating model equivalent of an org chart.

2 DEFINE DECISION RIGHTS AT EVERY EDGE

Decide explicitly: what the agent does autonomously, what requires human approval, what requires dual control, and what is prohibited. Then design the workflow around those rights — so humans are not asked to rubber-stamp volume they cannot meaningfully review.

3 IMPLEMENT TELEMETRY THAT PROVES BOTH VALUE AND CONTROL

Executives will eventually ask: “Show me business impact and show me it’s under control.” Build for both from day one.

VALUE METRICS

  • Cycle time reduction
  • Cost-to-serve
  • Throughput per team
  • Conversion / retention / NPS

CONTROL METRICS

  • Escalation rate
  • Override rate
  • Exception backlog
  • Incident frequency & drift

4 CREATE A LIGHTWEIGHT CHANGE CADENCE

Agents change faster than classic enterprise software. Your operating rhythm should include: pre-release scenario tests (happy path and edge cases), post-release monitoring and sampling review, incident reviews with learnings captured as design changes, and rollback procedures that work under pressure — not just on paper.

REGULATORY CONTEXT — EU AI ACT

The EU AI Act is fully applicable from 2 August 2026, with governance rules and obligations for general-purpose AI in effect from 2 August 2025. Prohibited practices and AI literacy obligations have applied since 2 February 2025. If your agents can act, you need to demonstrate governance, accountability, and documented evidence. Agent management is how you meet that bar without slowing down.

WORKING EXAMPLE

The Refund Agent

An AI agent handles refunds end-to-end: reads the request, verifies purchase data, applies policy, triggers the refund via payment API, updates the CRM, notifies the customer.

If you treat this as a productivity win, you miss the operating model redesign it requires. Run it properly and every discipline applies:

  • Task decomposition: triage, decision, execution are separate stages
  • Permission tiers: auto-refunds up to €X; above that, approval required
  • Audit trail: policy version, evidence used, tool calls, record changes
  • Accountability: one owner responsible for outcomes and error budget
  • Escalation: fraud signals, ambiguity, and exceptions route to humans
  • Containment: agent can be paused and actions rolled back immediately

Multiply this across HR, finance ops, procurement, and sales ops. That is why agentic AI changes org design — not just productivity.

Simulate before you scale

Most organizations discover operating model failures after rollout: exceptions explode, approvals become rubber-stamps, agents get over-permissioned to keep velocity, evidence trails are incomplete.

A DTO preview (Digital Twin of the Organisation) lets you stress-test human-agent handoffs, exception paths, decision rights, and blast radius — before a single agent touches production. The best starting point is one workflow with real operational load: customer support, finance ops, HR, or sales ops.

An AI Registry Preview : If you cannot list your agents, you cannot manage them. The AI Registry gives you a live inventory of every agent in production — purpose, permissions, owners, audit status, and risk tier — in one place.

FAQs
What is agent management?

Agent management is the operating discipline for designing, governing, and running AI agents that can act autonomously. It covers four areas: task decomposition (defining where agents can act and where they must stop), tool permissions (least-privilege access tiered by impact), audit trails (full traceability of inputs, decisions, tool calls, and outcomes), and accountability (named ownership and escalation paths for every agent in production).

Why do AI agents require an operating model change, not just governance controls?

Because once agents act, the questions that arise — who owns outcomes, what evidence exists, what can the agent touch — are org design questions, not just compliance ones. Decision rights, escalation paths, and ownership structures must be designed into how work runs, not appended as a controls layer afterward.

What are the biggest risks with agentic AI?

Tool misuse and identity or privilege abuse are the most consistently cited production risks — particularly when agents have broad tool access combined with a high degree of autonomy. Over-permissioning is the most common root cause: it is far easier to grant broad credentials than to design tightly scoped access, especially when teams are under pressure to ship.

What is a "stop condition" for an AI agent?

A stop condition is a pre-defined trigger that causes an agent to pause and escalate to a human rather than continuing to act. Examples include: confidence falling below a threshold, an action classified as Tier 3 (high impact), a fraud signal in an incoming request, or an output that does not match expected completion criteria. Stop conditions convert “bounded autonomy” from a concept into a designed mechanism.

What is a Digital Twin of the Organisation (DTO) and how does it apply to agentic AI?

A DTO is a simulation model of an organization’s workflows, decision points, and human-agent interactions. Applied to agentic AI, it lets teams test human-agent handoffs, exception paths, escalation rules, and blast radius scenarios before agents are deployed in production — reducing the risk of discovering failure modes at scale.