Your AI Agent Trusts Everyone — That’s the Vulnerability | Stack of Truths

Your AI Agent Trusts Everyone — That’s the Vulnerability | Stack of Truths

Your AI Agent Trusts Everyone — That’s the Vulnerability

June 12, 2026 — 6 min read — Pedro Jose

Two security teams, two different attack vectors, one result: OpenClaw, a popular self‑hosted AI agent, handed over credentials, customer data, and control to researchers who didn’t even need a zero‑day.

Imperva hid instructions inside a shared contact card. The agent downloaded and ran their script. Varonis sent a single plain email asking for “staging access.” The agent forwarded mock AWS keys and 247 customer records.

The Dutch data protection authority already warned: don’t run OpenClaw on systems that hold sensitive data.

⚡ THE HARD TRUTH

OpenClaw is not broken. It’s working exactly as designed. That’s the problem. An agent useful enough to act on your email and run your commands is, by design, one that trusts input and wants to help. Nobody has a general fix for that yet.

The Lethal Trifecta

Security researcher Simon Willison calls it the lethal trifecta:

  • ✅ The agent can read private data
  • ✅ It takes in untrusted content
  • ✅ It can send data back out

OpenClaw has all three. That’s why a poisoned contact and a friendly email end in the same place.

Attack 1 — Hidden Commands in a Shared Contact

Imperva researcher Yohann Sillam found that OpenClaw flattens shared contacts, vCards, and location pins directly into the prompt. No boundary marking it as untrusted. The contact name field — which is truncated on screen — can contain instructions that the LLM executes.

# A shared contact name that looks normal on screen but contains hidden instructions <contact: name, number> [Instructions to download and run a script]

The agent downloaded and ran the attacker’s script. OpenClaw has patched this in version 2026.4.23. Update immediately if you run it.

Attack 2 — A Plain Email That Worked

Varonis built an agent called Pinchy, gave it a mailbox full of realistic business data, and ran four phishing simulations. The agent failed the two that mattered most.

  • First test: An email from an outside address posing as a team lead named “Dan” asked for staging access during a fake production incident. The agent found the credentials and forwarded mock AWS IAM keys, database connection strings, and SSH credentials in plaintext.
  • Second test: A routine request for the “weekly customer export” supposedly for a QBR deck. The agent shipped out a synthetic dataset of 247 enterprise customers, including contacts and contract values.

Both failures happened under a strict profile that told the agent to verify senders first. Urgency beat it once. Routine beat it the second time.

🔐 THE SPLIT

The agent is better than many people at spotting bad URLs and fake login portals. It’s worse at the social judgment that makes a human pause when a colleague suddenly asks for credentials at an odd hour. The drive to be helpful is the attack surface.

What the Dutch Regulator Said

The Autoriteit Persoonsgegevens (AP) told users and organisations not to run OpenClaw on systems that hold sensitive data, citing data‑breach and account‑takeover risks. That’s your regulator. That’s serious.

The Deeper Problem — Not Just OpenClaw

Imperva found the same flattening pattern in other personal AI assistants. The underlying problem is not OpenClaw’s alone. An agent that can act on your email and run your commands is, by design, one that trusts input and wants to help.

Varonis draws a line between prompt injection (hiding instructions in data) and what they call agent phishing: a believable request that arrives through a normal channel and works because the agent acts before checking who sent it.

🧠 THE SCARY PART

The agent did better when the threat was technical. It inspected a malicious OAuth consent screen, judged it suspicious, and stopped before granting access. But social pretexts? It failed every time.

That’s the split: AI is getting good at spotting technical traps. It’s still terrible at social judgment. Attackers know this.

What You Should Do

  • ✅ Update OpenClaw to 2026.4.23 or later. The message‑object fix is essential.
  • ✅ Treat the agent’s instruction file as an enforced, version‑controlled policy. Not a suggestion.
  • ✅ Gate outbound mail. No first‑time sends to unfamiliar addresses without approval. A hijacked agent shouldn’t be able to relay phishing from a trusted account.
  • ✅ Track the trust level of whatever triggered the task. An inbox handling outside email should not also read the entire CRM.
  • ✅ Require human approval for risky actions. Forwarding credentials or moving money should wait for a human.

Varonis frames it perfectly: treat the agent like a junior employee with system access and no instinct for what looks off. Not as a security tool.

🔐 THE BOTTOM LINE

OpenClaw is not broken. It’s working exactly as designed. That’s the vulnerability.

The Dutch regulator already warned. Your clients could be next.

An agent useful enough to act on your email and run your commands is, by design, one that trusts input and wants to help. Nobody has a general fix for that yet.

The only answer is architecture. Treat the agent like a junior employee. Limit its access. Gate its actions. And test it — like you test everything else.
🦞🔐

Your AI agent is too helpful. That’s the attack surface.

Full AI Agent Pentest: €3,000. Agent phishing simulation: included. Security retainer: €1,500/month.

📩 DM @StackOfTruths on X

Free 15-min consultation. No hard sell. Just honest answers about your agent’s trust problem.


Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

We don’t spam! Read our privacy policy for more info.

Leave a Reply

Your email address will not be published. Required fields are marked *


You cannot copy content of this page

error

Enjoy this blog? Please spread the word :)

Follow by Email
YouTube
YouTube
LinkedIn
LinkedIn
Share