A serious security vulnerability has been disclosed in sglang If successfully exploited, it could result in remote code execution on vulnerable systems.
Vulnerability, tracked as CVE-2026-5760Holds a CVSS score of 9.8 out of 10.0. This is described as a case of command injection to execute arbitrary code.
SGLang is a high-performance, open-source serving framework for large language models and multimodal models. The official GitHub project has been forked over 5,500 times and starred 26,100 times.
According to the CERT Coordination Center (CERT/CC), the vulnerability affects the reranking endpoint “/v1/rerank”, which allows an attacker to achieve arbitrary code execution in the context of the SGLang service via a specially crafted GPT-Generated Unified Format (GGUF) model file.
“An attacker exploited this vulnerability by creating a malicious GPT Generated Unified Format (GGUF) model file with a crafted tokenizer.chat_template parameter that contains a Jinja 2 server-side template injection (SSTI) payload with a trigger phrase to activate the vulnerable code path,” CERT/CC said in an advisory issued today.
“The victim then downloads and loads the model into SGLang, and when a request reaches the “/v1/rerank” endpoint, the malicious template is served, which executes the attacker’s arbitrary Python code on the server. This sequence of events enables the attacker to achieve remote code execution (RCE) on the SGLang server.”
Per security researcher Stuart Beck, who discovered and reported the flaw, the underlying problem arises from the use of jinja2.Environment() without sandboxing instead of ImmutableSandboxedEnvironment. This, in turn, enables a malicious model to execute arbitrary Python code on the inference server.
The complete sequence of actions is as follows –
- An attacker creates a GGUF model file with a malicious tokenizer.chat_template that contains the Jinja2 SSTI payload
- The template contains the Qwen3 reranker trigger phrase to activate the vulnerable code path in “entrypoints/openai/serving_rerank.py”.
- Downloads and loads models into SGLang from sources such as Victim Hugging Face
- When a request reaches the “/v1/rerank” endpoint, SGLang reads the chat_template and renders it with jinja2.Environment().
- SSTI payload executes arbitrary Python code on server
It is worth noting that CVE-2026-5760 falls under the same vulnerability class as CVE-2024-34359 (aka Llama Drama, CVSS score: 9.7), which is a critical flaw in the llama_cpp_python Python package that can result in arbitrary code execution. The same attack surface was also patched in VLLM late last year (CVE-2025-61620, CVSS score: 6.5).
“To mitigate this vulnerability, it is recommended to use ImmutableSandboxedEnvironment instead of jinja2.Environment() to render chat templates,” CERT/CC said. “This will prevent the execution of arbitrary Python code on the server. No feedback or patches were received during the coordination process.”