Semantic Attacks: Exploiting What Agents See

The Era of Reality Injection.

Niki A. Niyikiza published on January 23, 2026

12 min, 2371 words

In Map/Territory, I covered the agent→tool boundary: what happens when an agent’s string gets interpreted by a system. Path traversal, SSRF, command injection. The execution layer.

This post covers the opposite direction: world→agent.

World → [perception] → Agent → [authorization] → Tool → System
         ^                      ^
         This post              Map/Territory

Not Prompt Injection

Most discussions on agent security today are about prompt injection: tricking the agent into doing something it shouldn’t.

Semantic attacks are different. They trick the agent into seeing something that isn’t there.

Split cockpit: human pilot sees mountains at 500ft; robot sees clear skies at 10,000ft

Basically, prompt injection targets the agent’s reasoning layer but semantic attacks target the interpretation layer. By the time the agent reasons about what to do, the damage is already done. It’s operating on poisoned input.

	Prompt Injection	Semantic Attack
Target	Reasoning	Perception
The Mechanism	Instruction Override (Hacking the Law)	Perception Spoofing (Hacking the Evidence)
Agent state	Conflicted (User vs. System)	Deluded (Map ≠ Territory)
The Result	Agent violates developer intent	Agent executes developer intent on false targets
Defense	Prompt hardening, POLA	Input normalization, validation, POLA
The fix	Better Boundaries	Better Instruments

With prompt injection, the agent must ignore the developer to serve the attacker. With semantic attacks, the agent obeys the developer but gets tricked by the environment.

It’s the difference between a guard being bribed (Prompt Injection) and a guard being shown a fake ID (Semantic Attack). The bribed guard knows they are breaking the rules. The fooled guard thinks they are enforcing them.

The Perception Gap

To understand the vulnerability, you have to look at how agents select targets compared to typical automation approaches.

Approach	Mechanism	The Guarantee	Attack Surface
CSS Selectors	`querySelector('#btn')`	Deterministic. You get exactly that element or `null`.	Small. You control the selector.
Semantic Queries	“Click the submit button”	Probabilistic. You get what the model believes matches.	Infinite. The attacker controls the page.

When you write code, you interact with the structure (DOM). When an agent acts, it interacts with the interpretation (Semantics).

This shift from explicit selectors to semantic interpretation creates the Perception Gap.

The Semantic Attack lives in this gap.

The Definition of Perception: I’m not talking about the network act of fetching a webpage. I mean the cognitive act of normalizing and parsing that page. Reading the bytes is transport; deciding that those bytes represent a “Safe Link” is perception.

The perception layer sees “submit button,” “search field,” or “user instructions”. It doesn’t see raw HTML or Unicode codepoints. It relies on a model to translate raw noise into these clean labels.

If an attacker can manipulate that translation layer (making a “Delete” button semantically resolve to “Save”) they don’t need to hack the brain. They just need to hack the eyes.

This Isn’t Theoretical

The last couple of years gave us a string of CVEs that exploit exactly this gap.

Not all of these are agent-specific. But they demonstrate the same class of vulnerability: validation and execution seeing different realities. As agents proliferate, these attacks will target them directly.

CVE-2025-0411 (7-Zip): Russian cybercrime groups used homoglyph attacks to spoof document extensions, tricking users and Windows into executing malicious files. They used Cyrillic lookalikes to make an archive with .exe files look like .doc. Trend Micro called it “the first occasion in which a homoglyph attack has been integrated into a zero-day exploit chain.” This matters for agents because they rely on file listing tools. If ls or the OS API returns a sanitized string but the filesystem executes the raw bytes, the agent is flying blind just like the user. The file looked safe. The bytes said otherwise.

CVE-2024-43093 (Android): A privilege escalation flaw in Android Framework. The shouldHideDocument function used incorrect Unicode normalization, allowing attackers to bypass file path filters designed to protect sensitive directories. The path string looked restricted. After normalization, it wasn’t. Actively exploited in targeted attacks.

CVE-2025-52488 (DotNetNuke): Attackers crafted filenames using fullwidth Unicode characters (U+FF0E for ., U+FF3C for \). These bypassed initial validation but normalized to standard characters, creating UNC paths that leaked NTLM credentials to attacker-controlled servers. The developers had implemented defensive coding. Normalization happening after validation created the bypass.

CVE-2025-47241 (Browser Use): A domain allowlist bypass in the popular AI browser automation library. The _is_url_allowed() method could be tricked by placing a whitelisted domain in the URL’s userinfo section: https://allowed.com@malicious.com. The agent thought it was visiting an allowed domain. It wasn’t.

GlassWorm (October 2025): A supply chain attack affecting 35,800+ npm installations. The malicious loader code was hidden using invisible Unicode characters, evading security scanners entirely. The code looked clean. The bytes contained a backdoor.

Pillar Security Disclosure (February 2025): Researchers demonstrated attacks on GitHub Copilot and Cursor via poisoned rules files. Hidden Unicode characters embedded malicious prompts that manipulated AI agents into generating vulnerable code. The prompts were invisible during code review and never appeared in chat responses.

The pattern: validation sees one thing, execution sees another. The system isn’t misbehaving. It’s correctly processing poisoned perception.

Attack Taxonomy

I’ve been cataloging these attacks. Five categories keep emerging.

1. Homoglyph Injection (The Doppelgänger)

Cyrillic ‘а’ (U+0430) looks identical to Latin ‘a’ (U+0061). To a human. To a string comparison, they’re completely different.

<!-- Legitimate page -->
<button id="real">Submit Order</button>

<!-- Attacker injects -->
<button id="fake" style="position:absolute; opacity:0.01;">
  Submіt Order  <!-- Cyrillic 'і' -->
</button>

Agent queries for “submit button.” Model returns both elements. If the agent picks by visual similarity or first match, attacker wins. The invisible overlay captures the click, submits to the attacker’s endpoint.

Your allowlist has "submit_button". The attacker’s element returns "submіt_button" (with Cyrillic ‘і’, U+0456).

Same pixels. Different bytes. Allowlist bypassed.

(Try selecting both strings above. Your cursor will behave differently on the Cyrillic character.)

2. Invisible Character Injection

Zero-width characters are invisible but change string identity.

<button>SubmitOrder</button>
<!-- Contains U+200B (zero-width space) between words -->

Visually: “SubmitOrder” Bytes: Submit\u200BOrder

The agent extracts this text, passes it downstream. Hidden characters corrupt data or trigger injection in systems that don’t expect invisible bytes. Forms get filled with malformed values. Logs become unreadable. Downstream parsers choke.

3. Visual-Semantic Mismatch

What humans see ≠ what the model interprets.

<button class="looks-like-cancel" data-semantic="confirm_payment">
  <span style="display:none">Confirm Payment</span>
  <span>Cancel</span>
</button>

Human sees: gray “Cancel” button. Model might see: element with hidden text “Confirm Payment,” aria labels, data attributes suggesting confirmation, depending on how it parses the DOM.

If the model weighs semantic attributes over visible text, the agent “cancels” an action but actually confirms it. Phishing pages with semantic misdirection.

This is “Perception Hijacking” in the wild. Attackers manipulate DOM elements to mislead web-browsing agents. The legitimate button gets replaced with a visually identical malicious link. Only the agent’s HTML parser sees the difference.

4. Statistical Denial of Service (Confidence Flooding)

Flood the page with decoys to dilute confidence on legitimate elements.

<!-- 50 hidden fake search inputs -->
<input type="hidden" aria-label="search input" class="decoy-1">
<input type="hidden" aria-label="search_input" class="decoy-2">
<!-- ... 48 more variations ... -->

<!-- Real search input -->
<input type="text" id="real-search" placeholder="Search...">

Model finds 51 candidates. Confidence distributes across all of them. Real input: 12% confidence. If the agent uses a confidence threshold (say, 80%), no element qualifies. Denial of service. If the agent picks highest confidence anyway, attacker controls which element is “first” via DOM order.

This exploits how models distribute attention, introducing distractors that split confidence, letting the real attack slip through.

5. Bidirectional Text Attacks

RTL override characters reverse text display.

<button>‮rotacidnI eruceS‬</button>
<!-- U+202E (RTL override) + reversed text + U+202C (pop) -->
<!-- Displays as: "Secure Indicator" -->

Visual: “Secure Indicator” Bytes: \u202ErotacidnI eruceS\u202C

The model might see the visual rendering (correct) or the raw bytes (garbage). If this text flows into code generation or logging, bidi characters corrupt everything downstream.

This isn’t new. CVE-2021-42574 (Trojan Source) showed bidi characters in code comments can hide malicious logic. We are now seeing the same mechanics deployed against AI agents. The only difference is the victim: yesterday it was a C++ compiler; today it is your autonomous browser.

An Attack in Practice

Here’s how these techniques combine.

A browser agent is tasked with paying an invoice. It navigates to the payment page and queries for the “Submit Payment” button. The attacker has injected a second button:

<!-- Legitimate button -->
<button id="real-submit">Submit Payment</button>

<!-- Attacker's overlay -->
<button id="fake-submit" style="position:absolute; opacity:0.01;">
  Submіt Payment  <!-- Cyrillic 'і' (U+0456) -->
</button>
<script>
  document.getElementById('fake-submit').onclick = () => {
    document.querySelector('input[name="recipient"]').value = 'attacker@evil.com';
    document.getElementById('real-submit').click();
  };
</script>

The agent’s semantic query returns both elements. Because the homoglyph Submіt is semantically identical to Submit, the model views them as valid candidates. It picks the attacker’s button (perhaps due to higher z-index, DOM order, or just random probability).

The agent clicks it.

The hidden script swaps the recipient, then triggers the real submit.

The payment executes. The audit log shows the agent clicked “Submit Payment” and the payment API returned 200 OK. The agent reports success.

The funds went to the attacker.

Why Current Defenses Fail

Defense	Why it doesn’t help
Output filtering	Attack is on input. Agent believes it’s doing the right thing.
Prompt instructions	“Only click legitimate buttons”. Agent thinks it is.
Tool allowlists	“submit_button” is allowed. Attacker’s element is labeled “submit_button.”
Human approval	Human sees “Click submit_button?”. Looks fine.
Sandboxing	Limits damage but doesn’t prevent the wrong click.

The Ontology Fallacy

There’s a pervasive belief that we can fix perception issues by building better Semantic Layers (Context Graphs and Ontologies) to disambiguate the world. Semantic layers solve ambiguity (“Does Q3 mean fiscal or calendar quarter?”). They do not solve deception. If an attacker uses a homoglyph to disguise malicious.exe as invoice.pdf, the Semantic Layer ingests it, validates it against the schema, and serves it to the agent as a “Verified Invoice.” The more we structure the data, the more the agent trusts it. Semantic layers don’t filter perception attacks; they launder them into trusted facts. Garbage in, Gospel out.

Why This Gets Worse

Three architectural trends amplify these attacks.

Memory and RAG. Agents increasingly store extracted content for later retrieval. A homoglyph injected today sits in the vector database until someone’s query surfaces it months later. The input filters that might have caught it at scrape time aren’t present at retrieval time. The Pillar Security disclosure demonstrated this: poisoned rules files with invisible Unicode persisted across sessions, silently corrupting every code generation request.

Multi-agent delegation. When Agent A extracts text and passes it to Agent B, there’s no normalization at the handoff. Agent A’s perception becomes Agent B’s trusted input. A bidi character that A extracted from a webpage flows to B as “clean” data. B acts on it. The audit log shows B followed correct procedures … on corrupted input.

Tool chaining. The output of scrape_webpage becomes the input to summarize_text becomes the input to send_email. Each tool assumes its input is clean. None of them normalize. Invisible characters or homoglyphs accumulate through the chain, corrupting the final output in ways that are nearly impossible to debug.

The longer agents run, the more trust boundaries they cross, the more these attacks compound. A single poisoned perception can cascade through an entire system before anyone notices the instruments were lying.

The Fix

Validate perception with the same rigor as action. The input layer is a security boundary.

The principles:

Normalize before matching: NFKC normalization, strip invisible characters, collapse confusables using UTS #39 skeleton algorithm
Reject anomalies: Mixed scripts (Cyrillic + Latin in the same label), known homoglyphs, bidi control characters
Confidence thresholds: If the model isn’t confident, don’t act
Defense in depth: Perception filtering AND execution-time authorization

What this looks like in practice:

# Normalize and validate before the agent ever sees it
for element in query_results:
    label = normalize(element.label)  # NFKC + strip invisibles
    
    if has_mixed_scripts(label):      # Cyrillic + Latin = reject
        continue
    if has_homoglyphs(label):         # Skeleton != original = reject  
        continue
    if element.confidence < 0.8:      # Low confidence = reject
        continue
    
    # Only now does the agent see this element
    filtered_results.append(element)

The Gauntlet

This doesn’t make perception attacks impossible. It makes them prohibitively expensive.

To succeed, an attacker can no longer just “trick the model.” They now have to thread a needle. They must craft a payload that:

Survives strict Unicode normalization (NFKC),
Bypasses mixed-script detection,
Maintains high confidence scores against the DOM,
AND passes execution-time authorization (Warrants).

We have reduced the attack surface from “any untrusted context” to “only validated signals.”

I am building the tooling to enforce these checks for both perception and execution as part of the Tenuo ecosystem.

More soon.

The New Frontier: Visual Perception Poisoning

As agents move from DOM-scraping to vision models (looking at screenshots and video), the attack surface expands from strings to pixels.

We are already seeing Adversarial Patches, where a physical sticky note on a webcam feed can trick a model into misclassifying a threat. We are also seeing Visual Prompt Injection, where text hidden in the alpha channel of an image (invisible to humans) is read perfectly by the agent’s OCR.

This matters because it breaks the “Verify with Human” defense. A human looking at the screenshot sees a blank page. The agent sees a command to export data. The agent isn’t just misinterpreting the code; it is seeing a reality that doesn’t exist.

The Stack

       PERCEPTION          REASONING         EXECUTION
            │                  │                 │
World ─→ [filtering] ──→ Agent ──→ [Tenuo] ──→ Tool
            │                  │                 │
       Validate here.    If poisoned,       Gate here.
                         already lost.

Security discussions focus on the agent’s hands (execution). We’ve barely started addressing its eyes (perception).

If you hack a pilot’s altimeter to show 10,000ft when they’re at 500ft, they will crash the plane while following perfectly correct procedures.

Agents are pilots flying on instruments. If the instruments (DOM, OCR, APIs) are lying, the agent crashes. No amount of “pilot training” (prompt engineering) fixes a broken altimeter. You can’t meditate your way out of CFIT.

Secure the instruments first.

But we have to be realistic. Instruments will eventually break. Attackers will always find a new way to blind the agent.

That is why I am building Tenuo: to enforce bounded, cryptographically proven delegation at every handoff that can limit the damage when the eyes inevitably fail. By constraining the execution layer, we ensure that even a blinded, hallucinating pilot cannot fly the plane into the ground or into airspace unrelated to the mission.

Secure the instruments. Enforce the flight plan. Then train the pilot.

Note: Security researchers often use “semantic attack” to refer to indirect prompt injection (like the Miggo Calendar exploit), where the model is tricked by the meaning of the text.

This post focuses on something different: perceptual semantic attacks, where the model is tricked by the identity of the input. Rather than convincing the agent to misbehave, these attacks fool it into misperceiving what it’s looking at.