A crucial reminiscence corruption vulnerability in vLLM variations 0.10.2 and later permits attackers to attain distant code execution by way of the Completions API endpoint by sending maliciously crafted immediate embeddings.
The vulnerability resides within the tensor deserialization course of inside vLLM’s entrypoints/renderer.py at line 148.
When processing user-supplied immediate embeddings, the system hundreds serialized tensors utilizing torch.load() with out ample validation checks.
The Vulnerability Defined
A change launched in PyTorch 2.8.0 disabled sparse tensor integrity checks by default, creating an assault vector for malicious actors.
With out correct validation, attackers can craft tensors that bypass inner bounds checks, triggering an out-of-bounds reminiscence write in the course of the to_dense() conversion.
This reminiscence corruption could cause the vLLM server to crash and probably allow arbitrary code execution inside the server course of.
AttributeDetailsCVE IDCVE-2025-62164SeverityHighCVSS Score8.8/10Affected ProductvLLM (pip)Affected Variations≥ 0.10.2
This vulnerability impacts all deployments working vLLM as a server, notably these deserializing untrusted or model-provided payloads.
Any consumer with API entry can exploit this flaw to attain denial-of-service circumstances and probably achieve distant code execution capabilities.
The assault requires no particular privileges, making it accessible to each authenticated and unauthenticated customers, relying on the API configuration.
Organizations utilizing vLLM in manufacturing environments, cloud deployments, or shared infrastructure face vital threat, as profitable exploitation may compromise your entire server and adjoining methods.
The vLLM undertaking has addressed this vulnerability in pull request #27204. Customers ought to instantly improve to the patched model.
As a brief mitigation, directors ought to limit API entry to trusted customers solely and implement enter validation layers that examine immediate embeddings earlier than they attain the vLLM processing pipeline.
The vulnerability was found and responsibly disclosed by the AXION Safety Analysis Group, highlighting the significance of coordinated vulnerability disclosure within the AI infrastructure ecosystem.
Observe us on Google Information, LinkedIn, and X for day by day cybersecurity updates. Contact us to function your tales.
