Public GitHub Scraping Was 2015.
Agent Prompt Scraping Is 2026.
In my years as a pentester, I’ve seen the playbook evolve. What used to be a noisy, complex operation has become a quiet, devastatingly simple one.
For years, scraping public GitHub repositories was the attacker’s easiest path. Database URIs with passwords baked in. Stripe tokens in config files. AWS keys in .env files. It was a gold mine.
2015: Attackers scraped GitHub commits for secrets.
2026: Attackers scrape AI prompts, agent logs, and MCP tool calls.
The attack surface moved. Most defenders didn’t follow.
The Old Problem (Still Active)
Public GitHub scraping never stopped. It’s still a goldmine of exposed credentials.
- Database URIs — passwords baked into connection strings
- API keys — Stripe, Twilio, SendGrid, OpenAI
- Cloud credentials — AWS, GCP, Azure keys in .env files
- Webhooks — Slack, Discord, Teams with full channel access
- SSH private keys — accidentally committed, never revoked
🔐 New rule: Never type secrets into an AI prompt. Never let your agent read .env files in the open.
The New Problem (Worse Than You Think)
Credentials don’t wait for the commit anymore. They show up in:
- AI prompts — developers asking “how do I connect to my AWS bucket?” and pasting keys for context
- Agent .env reads — AI agents reading environment variables to help debug or configure
- MCP tool calls — Model Context Protocol calls that pass secrets between tools
- Agent logs — debugging output that captures sensitive data
- Shared chat histories — internal AI chats where secrets are pasted for help
- Agent memory — long-term memory stores that retain secrets across sessions
The Tools Are Catching Up (Slowly)
GitGuardian just shipped ggshield hooks for Cursor, Claude Code, and GitHub Copilot. They scan prompts in real time, before secrets reach the model. It’s free. That’s good. But it’s not enough.
- Scanning is not pentesting. It catches known patterns. It doesn’t find novel leaks.
- MCP still has design flaws. 150M+ downloads affected. RCE vulnerabilities.
- Prompt injection bypasses scanning. An attacker can simply ask the agent for the key.
- Agent memory leaks. Secrets can persist in agent memory across sessions.
ggshield is a great first line of defense. But it’s not a replacement for understanding how secrets actually leak in AI systems — and testing for those leaks.
What You Should Do Right Now
- Install ggshield hooks — free and easy. Stop secrets from reaching models.
- Audit your prompts — stop pasting keys into AI chats.
- Review agent .env access — does your agent need to read all environment variables?
- Monitor MCP traffic — audit what secrets are being passed between tools.
- Pentest your AI agents — scanning catches known leaks. A red team finds the rest.
- Assume compromise — if a secret touched an AI prompt, rotate it.
The old attack surface was public repositories. The new attack surface is private conversations with AI.
Public GitHub scraping was 2015. Agent prompt scraping is 2026.
Don’t wait for your secrets to be the next headline.
Scanning your prompts? Good. Now pentest your agents.
I break AI agents — and find the leaks scanners miss. Full-stack security assessment for AI-assisted development.
📩 DM @StackOfTruths on XFree 15-min consultation. No hard sell. Just honest answers about your AI agent security.












Leave a Reply