A recent investigation by security company AIR has revealed a significant vulnerability in the way AI agent skills are vetted and trusted. The firm successfully created a fake AI skill, pushing it through a popular skill marketplace and promoting it via an Instagram ad. This skill reportedly reached around 26,000 agents, including those on corporate accounts, without raising any alarms.
How the Fake AI Skill Evaded Detection
Despite undergoing multiple security scans, the fake skill passed all checks. The payload was designed to be harmless, simply collecting users’ email addresses. This experiment demonstrated that common trust signals, such as security scanners, GitHub stars, and open-source reputations, were ineffective in identifying the threat.
The skill, named brand-landingpage, was designed to appeal to non-technical users by claiming to create landing pages using Google’s Stitch design tool. AIR bolstered its credibility by targeting GitHub stars and securing a clean verdict from security scanners. The skill became part of a repository with 36,000 stars, gaining visibility and trust among users.
Technical Oversight and Security Gaps
Security scanners typically analyze the package’s SKILL.md and accompanying files, but AIR’s skill cleverly circumvented this by instructing agents to install the ‘Stitch SDK’ from an external link. Initially, this link directed users to genuine documentation, misleading scanners into approving the package. Once widely installed, AIR altered the linked page to execute a script, which in the demonstration only sent back user email addresses.
This method exposed a critical flaw: scanners focus on the initial package without monitoring external content that can change post-approval. Real-world attackers could exploit this loophole to execute harmful actions, as the agent’s access can be manipulated through external scripts.
Recommendations for Enhanced Security
Experts suggest that skills should be treated like software, with thorough vetting of external links and consistent monitoring for changes. Organizations must identify and re-evaluate skills regularly, ensuring they are scrutinized through a controlled and secure platform. Static trust signals, such as GitHub stars or initial scan results, should not be solely relied upon.
Security firm AIR’s findings highlight a structural issue in current scanning practices. The method used in their experiment exploited weaknesses in trust signals, demonstrating the need for a more comprehensive approach to skill validation and monitoring to close these security gaps.
While AIR’s claims regarding the scale of the breach remain unverified, the methodology underscores real vulnerabilities. The industry must address these gaps to prevent potential exploitation by malicious actors in the future.
