Much ado about nothing: why AI agents keep failing in banking

Author: Macgregor Duncan
Date: Jan 29, 2026
Reading time: 5
Topic:
AIretail
Fractal animation

A lot of AI noise

In 2025, almost every bank refreshed its AI strategy. Banks launched AI experiments and agentic AI POCs. Press releases promised step-changes in customer support, operational efficiency and compliance automation. And yet, to date, banks have very little to show.

Most AI deployments in banks remain peripheral and stuck in innovation labs. Virtually no agentic use cases have been promoted into production to run regulated workflows. The gap between promise and impact remains wide. This isn’t because banks are conservative, or because regulators are blocking progress. It’s because expectations placed on AI are often unrealistic.

You can’t expect autonomous agents to perform alchemy. No model — however capable — can reliably turn fragmented data, inconsistent systems and invisible decision-making into coherent outcomes. You can’t turn dust into gold.

The obvious problem: fragmented data

The most obvious failure is poor data. This is well understood — and largely unsolved in most banks. Banks struggle with AI because their data is fragmented across dozens of systems of record, typically provided by different vendors, each with its own schema, identifiers and implicit assumptions.

In many banks, there is no single canonical source of truth with consistent definitions. Without a canonical data model imposing consistency, AI systems cannot reason reliably and pull the right value from the right system at the right time. The foundations are unstable. This is why many AI initiatives stall before they reach production.

But while this unglamorous problem is well understood, it’s not even the most important reason why AI agents struggle in the context of banking.

Why fixing data isn’t enough

Even if a bank fixes its data model — unified customers, accounts, products, risk, compliance — most AI systems will still fail. Because banks don’t run on rules and data alone. They also run on judgement.

A significant portion of operational processes and decisions in a bank involve exceptions: think of credit overrides, risk escalations, compliance judgments and customer-specific treatments. These are not edge cases. They’re how banks function. How and why these decisions get made is never captured in simple process documentation and systems of record. They live in emails, Slack threads, and often in people’s heads.

So even when the final decision is well recorded, the reasoning behind that decision is not. This means there’s no precedent for how many decisions get made. The data you need for high-value AI use cases is missing.

Why AI agents fail without context

This explains why most agentic AI approaches struggle in regulated environments.
AI agents without context aren’t trustworthy. They can’t navigate exceptions reliably, balance competing risks, explain their actions to regulators, or learn from human correction in a durable way.

An agent that sees only outcomes, not how decisions get made and weighed, cannot improve meaningfully. There’s no feedback loop and no precedent.

For AI companies to solve this challenge, and to build decision memory into their AI agents, they must own the full execution path and be capable of capturing why decisions get made in order to build up precedent to improve future decision-making.

The missing layer: decision infrastructure

A recent post from US VC, Foundation Capital, makes this point well. Their argument is that the next generation of enterprise software will not just record what happened — it will record why it happened.

They describe this as a context graph: a living, queryable record of decision traces which serves as a set of precedents to guide future decision-making. This is not a model’s internal chain-of-thought expressed through a prompt, but a form of organizational memory:

  • what decision was made

  • by whom (human or agent)

  • based on what context

  • considering what factors

  • referencing which prior cases or precedents

  • with what approvals or overrides

Over time, precedent becomes searchable. The “feel” for when and how context and exceptions should be applied become well understood by models.

This is not a “nice to have” for banks. It's a prerequisite for trust, explainability, auditability and regulation-grade AI.

The execution gap

With judgement addressed, the hurdle becomes taking action to execute. Models can recommend, summarise, flag risk and make decisions, but their ability to act safely is missing. There is no governed way to commit actions, enforce approvals, or reverse mistakes. Without managed execution paths, AI stays in read-only mode and agent-based systems stall.

This is why model context protocols (MCPs) matter. They give AI structured, governed access to workflows so it can act with limits, traceability, and rollback. Without these AI remains advisory rather than operational.

Trust and observability, the hidden requirement

Trust does not come from model capability alone. AI systems require continuous monitoring and evaluation to ensure they behave as expected. Performance drift, data changes, policy updates, and edge cases all degrade behaviour over time.

Most banks underestimate this in their early deployments. Observability is an afterthought. Evaluation is manual and fails to scale. Feedback loops are informal.

AI systems must be observable by design. Decisions traceable. Performance measured continuously against defined risk and quality thresholds. Without this, trust erodes and systems are inevitably rolled back.

Accountability, AI is a business outcome

Another structural failure is business accountability. Banks often treat AI as a tech transformation rather than a business one. It’s commonly led by the chief technology officer, with technical teams running proofs of concept on problems that either do not move the dial, address only a narrow slice of the problem, or fail under production constraints. This is a category error.

AI outcomes are business outcomes. They’re the same outcomes already owned by the chief executive and line managers: customer experience, cost, risk, compliance, growth, and operational capacity.

Until accountability for AI sits with the leaders who own those outcomes, efforts will remain experimental. Tech teams can’t compensate for unclear ownership, misaligned incentives, or missing decision rights.

Operating model, from project to product to platform

Most banks struggle with their operating model for AI. AI is treated as a project with fixed scope, long release cycles, and heavy governance, siloed inside tech functions. Experimentation is slow. Feedback loops are weak. This is a dead end.

AI must be treated as a product, not a project. That requires a fit-for-purpose operating model where product, engineering, and domain experts work together continuously. These teams need to be embedded directly into business units, co-creating applications with operators, risk, and compliance. AI systems must be shaped in the context where decisions are truly made.

 

Talent, the real bottleneck 

Banks can’t do it alone. Talent is a hard constraint that many underestimate.
There is a very small global pool of people who know how to deliver AI into regulated production environments reliably. Very few of them work directly for banks.

Many institutions have tried to go solo, assembling solutions from system integrators and generic components, without the specialist experience required to make these systems safe, observable, and resilient.

Deep partnerships with AI vendors that bring forward deployed engineers are becoming essential to access scarce capability, deliver early wins, and transfer knowledge into internal teams.

But embedding forward-deployed engineers is not a scalable platform model in banking. Every bank has different systems, different data models and different exception-handling rules. Crucially, it doesn’t compound, as decision context never becomes a durable precedent.

In short, you can’t just layer an AI company on top of a bank. It's consulting, with an API.

A new model for banks

This points to a different model. One way for agentic AI to work in banking is through a whole-of-bank platform model. The platform must enforce a canonical data model, own the execution pathway around operational decisions, capture decisions at commit time, and retain judgement as durable precedent. The platform must have embedded governance to provide trust and accountability.

Platforms built this way can train agents through real operational feedback loops, with explainability and auditability as native properties.

Constantinople is designed along these lines. By running banking workflows on a unified data model, and by sitting directly in the flow of operational decision-making, Constantinople captures not just outcomes, but the reasons behind them.
This allows our AI systems to improve over time in a way that is trustworthy, regulator-ready and scalable.

Fractal animation
Macgregor Duncan

written by

Macgregor Duncan

connect on Linkedin