The Hallucination Defense

Why logs make 'The AI Did It' the perfect excuse

Niki A. Niyikiza published on
8 min, 1468 words

Categories: Agentic Security

“The AI hallucinated. I never asked it to do that.”

That’s the defense. And here’s the problem: it’s often hard to refute with confidence.

A financial analyst uses an AI agent to “summarize quarterly reports.” Three months later, forensics discovers the M&A target list in a competitor’s inbox. The agent accessed the files. The agent sent the email. But the prompt history? Deleted. The original instruction? The analyst’s word against the logs.

Without a durable cryptographic proof binding the human to a scoped delegation, “the AI did it” becomes a convenient defense. The agent can’t testify. It can’t remember. It can’t defend itself.


Logs Aren’t Proof

“But we log everything. We have OAuth logs.”

Most production agent systems do log a lot, and that’s good practice. Logs give visibility into what happened, when, and which component did it:

2026-01-15T14:32:01Z agent=research-bot action=file_read path=/data/ma/target-corp.pdf
2026-01-15T14:32:03Z agent=research-bot action=email_send to=external@competitor.com

With the right setup (append-only storage, signed timestamps, retention controls), logs can be tamper-evident. They can be excellent evidence that an event occurred inside your system.

But in disputes, the question is rarely “did something happen?” It’s:

Who authorized this class of action, for which agent identity, under what constraints, for how long; and how did that authority flow?

A common failure mode in agent incidents is not “we don’t know what happened,” but:

We can’t produce a crisp artifact showing that a specific human explicitly authorized the scope that made this action possible.

This gap gets wider in multi-agent systems:

  • A human authorizes an orchestrator.
  • The orchestrator spawns sub-agents.
  • Sub-agents call plugins, third-party services, or external runtimes.
  • The final action executes somewhere that may not share your identity domain, your audit system, or your policy engine.

In that world, logs can still show: “a valid session existed” and “a component with access acted.” But it becomes harder to show, with a single verifiable chain, that the final actor was operating under a scope the human actually delegated; rather than under a generic session token, a broad integration credential, or inferred intent.

This isn’t a dismissal of logging, approvals, policy engines, or token hardening. It’s an argument that accountability needs one more artifact: independently verifiable authorization evidence that survives multi-hop execution.

That’s the liability gap: between “we recorded an event” and “we can produce a verifiable delegation chain for it.”

Without warrants: no proof of who authorized. With warrants: signed chain from user to tool with cryptographic receipt


Authorization as a First-Class Artifact

When real money moves, institutions don’t rely on “someone had a session.” They require explicit authorization steps (step-up authentication, approvals, dual control, callbacks) and keep durable records of the authorization decision. In inter-organization rails, messages are authenticated so participants can verify who sent what within that rail.

Not every bank user personally applies a cryptographic signature to every instruction but there is a more general point:

In high-stakes systems, the unit of accountability is the action and its authorization record, not a long-lived session.

The check system, for all its well-documented flaws, is still interesting because it treats authorization as an artifact you can present later, not a session you have to reconstruct. In a loose, pre-cryptographic way, it gestures at two properties we want for agent delegation.

First, designated negotiation. Checks are addressed to a payee, and endorsement/deposit rules attempt to control who can successfully negotiate the instrument and where. Restrictive endorsements (“for deposit only…”) are a crude procedural attempt at holder binding. It’s not cryptographic enforcement, but the shape is right: an authorization artifact meant for a particular holder or route, rather than a replayable credential.

Second, non-amplification. Checks instruct settlement against scarce funds. You can write many checks, but settlement ultimately reconciles against a limited balance (or credit line). Failure may be detected late, but delegation doesn’t create value.

Tenuo Warrants apply both ideas to agent actions with modern enforcement: a warrant is holder-bound to a specific agent key, and attenuable so delegated scope can only narrow as it flows downstream.

And this is the non-repudiation point: if delegation is going to cross tools and sub-agents, you need a durable artifact you can show later that answers who authorized what.

But in agent systems we authenticate a session (“Bob is logged in”) and then infer intent from a mixture of logs, prompts, and downstream effects. That works until it doesn’t; especially when an incident involves ambiguous delegation paths, third-party tools, or autonomous sub-agents.

OAuth is great at what it’s designed to do: delegating access and expressing scopes at the token level. But a bearer token is a portable credential: whoever holds it can use it. You can reduce replay risk with sender-constrained tokens (mTLS, DPoP), but even then a primitive is missing:

Where is the action-level authorization artifact that says:

“This human authorized this agent identity to perform this class of operations within these constraints for this duration”?


Warrants: Signed Authorization for Every Action

A Tenuo warrant is a cryptographic, scoped, time-bound authorization object that can be verified independently of the agent runtime and that remains meaningful across multi-hop delegation.

# Human signs the authorization (via Passkey/WebAuthn, not manual key management)
warrant = Warrant.mint(
    issuer=alice_passkey,
    holder=agent_public_key,
    capability="file_read",
    constraints={"path": Subpath("/data/reports")},
    ttl=timedelta(hours=1),
)

When the agent reads a file, it presents this warrant. The file server validates the signature, checks the constraints, and produces a receipt that pairs authorization evidence with the action metadata.

A verifier checks:

  • Issuer signature (who authorized)
  • Holder binding (the caller proves possession of the agent key named in the warrant)
  • Capability + constraints + expiry (what was allowed, within which bounds, for how long)
  • Delegation chain (how authority flowed across hops, including whether the agent was allowed to delegate)

The receipt captures:

  • Alice’s signature in the warrant (cryptographic proof of authorization)
  • The constraints (cryptographic proof of authorized scope)
  • Validation time (evidence of when it was authorized/accepted)
  • Action metadata (evidence of what was requested/executed, depending on what you record)

Warrant receipt structure

Logs describe. Receipts prove.


The Attack, Replayed

Same scenario. Analyst wants to process a batch of vendor invoices. Easier to sign one warrant with a high limit and let the agent handle the rest than approve each transfer individually.

The warrant their passkey signed at 3:12 PM:

tool: transfer
amount: range(0, 50000)
to: *
ttl: 3600

Every other analyst that day processed similar batch sizes. They signed 12-15 warrants each:

tool: transfer
amount: range(0, 500)
to: vendors/approved/*
ttl: 60

Three months later, forensics flags a $48,000 transfer to an external account mixed in with the batch.

Analyst’s defense: “The AI hallucinated. I was just trying to be efficient.”

Your response: Everyone else processed the same volume with task-scoped warrants. You signed one that authorized 100x the limit, to any recipient, for an hour. You signed it.

The receipt answers what logs can’t: what did you choose to allow?


“But What About Prompt Injection?”

If an attacker hijacks the agent mid-session, doesn’t that break accountability?

Warrants don’t magically stop prompt injection. They make the blast radius explicit and the authorization undeniable.

Constraints limit what can happen. The warrant says Subpath("/data/reports"). If the injection tries to read /etc/shadow, it will be deterministically denied. The capability doesn’t exist, regardless of what the prompt says.

The attack succeeds. The action doesn’t.

Receipts prove what was authorized. If something did happen, the warrant chain answers who signed off on the scope that allowed it.

Approval is explicit. The UI doesn’t say “Authorize Agent.” It says “Authorize Agent to read /data/reports for 1 hour.”

Broad authorization is a choice. A choice you sign. A choice you own.

Warrants are both a guardrail (prevention via constraints) and a receipt (accountability via signatures).


“What If the Signing Device Is Compromised?”

If a passkey is stolen, you have a crime scene. The attacker had to compromise a specific device. You know which one, when, and what it signed. The forensics point somewhere.

If an OAuth token is stolen, you have a ghost. Bearer tokens have no proof of possession: whoever holds it is authorized. It works from anywhere. Logs show what happened, but nothing ties the action to a device, a user, or a moment of intent.

A log is an assertion by your system. A receipt is a statement signed by the authorizer.


Trust the Math

Prompt filters don’t take the stand. When the breach happens, when the subpoena lands, when the regulator asks “prove this was authorized,” you don’t want to explain your prompt engineering strategy.

Signatures bind humans to actions. Holder binding makes stolen warrants useless. Constraints limit blast radius. None of it requires trusting the model.

You want receipts.


Tenuo is an open-source authorization framework for AI agents. Ed25519 signatures, capability-based delegation, 27μs verification.

Deploying agents in production? Let’s talk.