Skip to content

Is half your workforce breaking AI policy? | The AI Insider Threat Report

Read Now
Blog
03 Sep 2025

Explainability: Shining a Light into the AI Black Box 

Explainability: Shining a Light into the AI Black Box 

Explainability: Shining a Light into the AI Black Box 

Insights by James White, CTO, CalypsoAI

With GenAI, we now have a good understanding of how answers are generated and given back to us. Simply, AI is a transformer architecture — it's choosing the best prediction for the next word and sentence, and giving back the most likely answer from the information it has been trained on.

However, when an application or process uses AI or AI agents to carry out a task – choosing candidate resumes, for example – organizations rightly want to know why the decision was made, and how that decision came about. That’s because, in worst-case scenarios, if the how is bad, the outcome may no longer be useful. 

This is the ‘black box’ we hear about in AI, the lack of clarity on how an application or agent completed a task based on the criteria it was given. In the pre-GenAI period, dating back to neural networks, explainability was described in line charts, showing which parts of the information had the greatest impact on the decision being made.

This all becomes much more complex when it comes to GenAI, and agents in particular. A simple agent will have a brain, which is the underlying AI model; it will have a purpose, which is its task(s); and it will have the tools it can interact with to complete its purpose. 

An AI agent doesn't have a moral compass or a lifetime of being taught right from wrong; it doesn’t have an understanding of real-world consequences of its actions. You can't guarantee an agent will do something how you want it to, especially if you want to foster autonomy.

For example, an organization might task an agent to delete a company name, Random Company LLC, from its customer list. The agent may respond, ‘Done, Random Company LLC is no longer in our database’, but it may have achieved that by deleting all customer names starting with ‘R’. While it technically achieved the task, the ‘how’ has serious consequences. 

Agentic Fingerprints: Explainability for AI Agents

To understand how an agent carries out its task, we need to understand – after the fact – every thought the agent had, every interaction it had, every decision and action it took. To provide transparency or explainability, we need to be able to break up the answer or outcome that the agentic system gave, and see which parts fell into which category. 

When we know that information, we have visibility of the things the agent attempted that were unsuccessful, and the things that were successful. Even if they're successful, they may not be right, like the example above, where the agent achieved its purpose but in a negative way. 

To achieve that, in CalypsoAI’s Inference Red-Team product, we have developed Agentic Fingerprints, offering full-spectrum visibility into every decision, prompt and pivot made by an autonomous agent. 

With Agentic Fingerprints, the first step is recording everything the agent does. With an agent, we can see what it is doing with tools really easily – the challenge comes with identifying what the agent is thinking, through all of the iterations with self-directed chain-of-thought models, and how it ultimately gets to the one it decides is right. 

Agentic Fingerprints parses all of those chains of thought and gives a full record of the agent’s activity in chronological order. We can see where it peels off into an action, or it separates into a new question. 

That is represented in a decision-tree format, a top-down, exploding node graph of all the thoughts and decisions that got the agent to its outcome. It’s clear which path the agent took to successfully complete its task. Everything else, by definition, is an unsuccessful avenue, a cul-de-sac the agent went down, from which we can draw learnings. 

For the first time, Agentic Fingerprints gives organizations explainability, a quick and easy understanding of what the Red-Team agents did and how they did it. 

Outcome Analysis: Advancing AI Explainability

When we think about applying GenAI or agentic workflows to real life, if the output is a refusal to complete a request, we should be able to see why that’s the case. For too long now, GenAI, when it's making decisions, we’ve only been able to see the outcome as simply positive or negative. 

We at CalypsoAI were guilty of that. We would provide enterprises with the industry-leading, best-in-class, security scanners that block malicious traffic based on the configuration for a particular system; the outcome would be a ‘yes’ or a ‘no’. 

That's fine in the early stages of a new technology. But as AI is being more and more used in production, in real-life scenarios, it's critical that we don't just say, ‘Computer says no’. Our customers, and their customers, deserve a full answer. 

Perhaps it's a ‘no’ for legal reasons; perhaps it's a ‘no’ because the user doesn’t have permission to access a particular item on their service plan. Whatever it is, they need to understand why. It can't just be ‘no’.

Within CalypsoAI’s Inference Defend product, Outcome Analysis now answers why something was blocked by the security scanning controls put in place as part of an enterprise AI system. We show an easy-to-understand insight based on the content that was provided, and the specific part that triggered a particular rule. 

AI systems are an opaque black box but Outcome Analysis opens up that opaque lid, letting the light in. Enterprises can now have confidence on why our Inference Defend scanners block certain content or data, the circumstances that were at the time, and the reason it was blocked.

Our customers are servicing their own customers — and they're going to get asked questions — so we now provide the capability to them to self-fulfil the questions they get from their customers. 

Why AI Explainability Matters for Enterprise Adoption

The responsibility we have in security is not being outlandish; security should be in the background keeping things safe, not in the foreground juggling. So when CalypsoAI releases features, they are things that are necessary to facilitate the jugglers front-of-stage, the model developers and organizations doing things with AI that grab the headlines. 

Transparency, explainability, are never going to be the juggler act in the front but they are vital, and we would go as far as to say, expected. CalypsoAI’s explainability features feel like things that should exist in order for enterprises to confidently adopt AI, and agents in particular. 

If enterprises can understand how and why an agent made its decisions, they can take informed remediation action rapidly if something does go wrong. That's something that's critical for agentic workflows.

With CalypsoAI’s transparency features, we can show that agents are doing real work, how they do it, and why they make their decisions. It's tangible, and it’s the first time organizations can visually understand something that's happening at machine speed in a virtual world.

Watch the Full Conversation

In this video, James dives into this topic with CalypsoAI Communications Lead, Gavin Daly.

To learn more about our Inference Platform arrange a callback.

Latest Posts

Blog

Guardrails Aren’t Enough: Why Scanners Are Essential for AI Security

AI Inference Security Project

The Geopolitics of Artificial Intelligence: Inside the U.S.-China AI Arms Race

AI Inference Security Project

The AI Security Market: From “What If” to “Must-Have”