Hackers Can Manipulate Claude AI APIs with Indirect Prompts to Steal User Data – Cyber Web Spider Blog

Hackers can exploit Anthropic’s Claude AI to steal delicate consumer knowledge. By leveraging the mannequin’s newly added community capabilities in its Code Interpreter device, attackers can use oblique immediate injection to extract non-public data, comparable to chat histories, and add it on to their very own accounts.

This revelation, detailed in Rehberger’s October 2025 weblog put up, underscores the rising dangers as AI techniques turn into more and more related to the surface world.

In line with Johann Rehberger, the flaw hinges on Claude’s default “Package deal managers solely” setting, which allows community entry to a restricted checklist of accepted domains, together with api.anthropic.com.

Whereas supposed to let Claude set up software program packages securely from websites like npm, PyPI, and GitHub, this whitelist opens a backdoor. Rehberger confirmed that malicious prompts hidden in paperwork or consumer inputs can trick the AI into executing code that accesses consumer knowledge.

Oblique Prompts Assault Chain

Rehberger’s proof-of-concept assault begins with oblique immediate injection, the place an adversary embeds dangerous directions in seemingly innocuous content material, like a file the consumer asks Claude to investigate.

Leveraging Claude’s current “reminiscence” function, which lets the AI reference previous conversations, the payload instructs the mannequin to extract current chat knowledge and reserve it as a file within the Code Interpreter’s sandbox, particularly at /mnt/user-data/outputs/good day.md.

Subsequent, the exploit forces Claude to run Python code utilizing the Anthropic SDK. This code units the atmosphere variable for the attacker’s API key and uploads the file by way of Claude’s Recordsdata API.

Crucially, the add targets the attacker’s account, not the sufferer’s, bypassing regular authentication. “This labored on the primary attempt,” Rehberger famous, although Claude later grew cautious of apparent API keys, requiring obfuscation with benign code like easy print statements to evade detection.

A demo video and screenshots illustrate the method: An attacker views their empty console, the sufferer processes a tainted doc, and moments later, the stolen file seems within the attacker’s dashboard as much as 30MB per add, with a number of uploads doable. This “AI kill chain” might lengthen to different allow-listed domains, amplifying the risk.

Rehberger responsibly disclosed the problem to Anthropic on October 25, 2025, by way of HackerOne. Initially dismissed as a “mannequin security subject” and out of scope, Anthropic later acknowledged it as a legitimate vulnerability on October 30, citing a course of error.

The corporate’s documentation already warns of information exfiltration dangers from community egress, advising customers to observe periods carefully and halt suspicious exercise.

Specialists like Simon Willison spotlight this as a part of the “deadly trifecta” in AI safety: highly effective fashions, exterior entry, and prompt-based management.

For mitigation, Anthropic might implement sandbox guidelines limiting API calls to the logged-in consumer’s account. Customers ought to disable community entry or whitelist domains sparingly, avoiding the false safety of defaults.

As AI instruments like Claude combine deeper into workflows, such exploits remind us that connectivity breeds hazard. With out strong safeguards, what begins as useful automation might turn into a hacker’s playground.

Observe us on Google Information, LinkedIn, and X for each day cybersecurity updates. Contact us to function your tales.

Related Posts