AI Tools Vulnerability Exposed

A team of security experts has uncovered a new prompt injection attack, dubbed ‘Comment and Control’, which poses a threat to several popular AI-driven code automation tools.

Discovery of the Vulnerability

The vulnerability was identified by security researcher Aonan Guan, with contributions from Zhengyu Liu and Gavin Zhong from Johns Hopkins University. Their research revealed that the attack affects major AI agents, including Anthropic’s Claude Code Security Review, Google’s Gemini CLI Action, and GitHub Copilot Agent.

In a detailed blog post released on Wednesday, Guan explained how the exploit works. The attack method allows malicious actors to take advantage of GitHub comments, such as pull request titles and issue bodies, to hijack the AI agents associated with these tools.

Mechanics of the Attack

The research illustrates the potential for attackers to manipulate Claude Code Security Review by crafting a specific pull request title. This could lead the AI to execute unauthorized commands, potentially exposing sensitive information as security findings or entries in GitHub Actions logs.

For the Gemini CLI Action, attackers can use issue comments with crafted prompts to bypass security measures and gain access to full API keys. This poses a significant risk to the integrity of automated coding tasks.

In the case of GitHub Copilot Agent, the vulnerability allows attackers to hide malicious payloads within HTML comments. This approach can circumvent environment filters, scan for sensitive data, and bypass network firewalls.

Implications and Response

The researchers warn that the Comment and Control attack could have severe consequences, as it triggers malicious prompts automatically through GitHub Actions workflows. This process requires no intervention from the victim, except for the manual assignment of issues to the Copilot by the user.

The findings have been reported to the affected companies, Anthropic, Google, and GitHub, all of which have acknowledged the issue. While Anthropic labeled the vulnerability as ‘critical’, Google and GitHub classified it differently, with varying levels of severity.

In response, the companies have taken steps to mitigate the threat, with Google awarding a $1,337 bug bounty and GitHub offering $500. These efforts highlight the importance of addressing such vulnerabilities promptly.

Broader Security Concerns

This discovery marks the first public demonstration of a cross-vendor prompt injection pattern affecting multiple AI agents. Guan emphasized that the issue is more about the architectural design, where AI agents handle powerful tools and secrets in environments that process untrusted user inputs.

Despite the existence of several defense layers, including model-level and prompt-level protections, the inherent design allows for these injections to occur. The research calls attention to the necessity for a more robust architectural approach to safeguard against such vulnerabilities in the future.

Discovery of the Vulnerability

Mechanics of the Attack

Implications and Response

Broader Security Concerns

Related Posts