# vllm 崩溃问题排查 ## 问题原因 vllm在处理请求时,调用了 get_structured_output_key 函数,在处理 structured_outputs 参数时,由于不满足所有的情况,导致了抛出**No valid structured output parameter found**异常,该异常未被 EngineCore 捕获导致引擎崩溃,APIServer 发现引擎死了,自己也退出了进程。 ## 使用抓包工具 tshark 监控 http 请求 ### 在合适的目录下执行下面的命令,tshark 需要 root 权限,请使用 root 账号执行 ##### 安装 tshark(已执行完毕) ``` sudo apt-get install tshark ``` ##### 创建一个 shell 脚本 ``` vim hook_vllm_gpt-oss-120b.sh ``` 脚本内容如下: ``` #!/bin/bash # ======================================== # 监控本地 v1/chat/completions 接口的 HTTP 请求与响应 # 仅保留最近 20 条日志(含流式响应,如 SSE) # 使用 tshark + TCP 流跟踪 # ======================================== # 配置 INTERFACE="lo" # 本地回环接口 PORT="8077" ENDPOINT="/v1/chat/completions" LOG_FILE="/hook/chat_completions.log" TEMP_LOG="/hook/chat_completions.tmp" PID_FILE="/hook/hook_vllm_gpt-oss-120b.pid" # 检查是否已运行 if [ -f "$PID_FILE" ]; then if ps -p $(cat "$PID_FILE") > /dev/null 2>&1; then echo "【错误】监控脚本已在运行 (PID: $(cat $PID_FILE))" exit 1 else rm -f "$PID_FILE" fi fi # 记录 PID echo $$ > "$PID_FILE" # 清空日志 > "$LOG_FILE" echo "✅ 开始监控 http://127.0.0.1:$PORT$ENDPOINT" echo "📝 日志将保存到: $LOG_FILE" echo "⏳ 仅保留最近 20 条,按 Ctrl+C 停止" # 信号处理:清理 PID 文件 trap 'rm -f "$PID_FILE"; echo "⏹️ 监控已停止"; exit 0' SIGINT SIGTERM # 使用 tshark 跟踪 TCP 流 sudo tshark \ -i "$INTERFACE" \ -f "tcp port $PORT and host 127.0.0.1" \ -q \ -z "follow,tcp,ascii" \ 2>/dev/null | \ stdbuf -oL awk -v endpoint="$ENDPOINT" -v log="$LOG_FILE" -v temp="$TEMP" ' BEGIN { RS = "\n\n" ORS = "" in_request = 0 buffer = "" count = 0 } # 分割流,识别每条 TCP 流 { if (match($0, /GET|POST|PUT|DELETE|HTTP/) && index($0, endpoint)) { # 提取时间戳(tshark 输出第一行包含时间) if (match($0, /Following.*on port [0-9]+$/)) { ts = substr($0, RSTART, RLENGTH) gsub(/.*on/, "on", ts) } else { ts = "unknown time" } # 提取请求行和头 split($0, lines, /\n/) for (i=1; i<=length(lines); i++) { if (lines[i] ~ /(GET|POST|PUT|DELETE)/ && index(lines[i], endpoint)) { request_line = lines[i] } if (lines[i] ~ /Content-Type:/ || lines[i] ~ /Authorization:/ || lines[i] ~ /User-Agent:/) { headers = headers "\n " lines[i] } } # 提取请求体(通常在空行后) body = "" in_body = 0 for (i=1; i<=length(lines); i++) { if (lines[i] == "" || lines[i] ~ /HTTP\/[0-9.]/) { in_body = 1 continue } if (in_body && lines[i] !~ /(No response found|Following)/) { body = body lines[i] "\n" } } # 提取响应部分(HTTP/ 开头) response = "" for (i=1; i<=length(lines); i++) { if (lines[i] ~ /^HTTP\// && i > 1) { for (j=i; j<=length(lines); j++) { if (lines[j] !~ /Following/) { response = response lines[j] "\n" } } break } } # 构造日志条目 entry = "========================================\n" entry = entry "🕒 " ts "\n" entry = entry "📤 请求: " request_line "\n" if (headers != "") { entry = entry "📎 头部:\n" headers "\n" } if (body != "") { entry = entry "📦 请求体:\n" body "\n" } if (response != "") { entry = entry "📥 响应:\n" response "\n" } entry = entry "========================================\n\n" # 写入日志并保留最近 20 条 cmd = "echo -e \"" entry "\" >> " log " && tail -n 200 " log " > " temp " && mv " temp " " log system(cmd) # 重置 headers = "" body = "" response = "" } } ' # 正常退出时清理 rm -f "$PID_FILE" ``` ##### 赋予执行权限 ``` chmod +x hook_vllm_gpt-oss-120b.sh ``` ##### 后台运行脚本 ``` nohup /hook/hook_vllm_gpt-oss-120b.sh > /dev/null 2>&1 & ``` ##### 查看请求日志 ``` tail -f /hook/chat_completions.log ``` ##### 停止脚本 ``` pkill -f hook_vllm_gpt-oss-120b.sh ``` ### 函数 get_structured_output_key 实现 #### 分支:release/v0.11.0 https://github.com/vllm-project/vllm/blob/releases/v0.11.0/vllm/v1/structured_output/request.py ```python def get_structured_output_key(sampling_params: SamplingParams) -> StructuredOutputKey: params = sampling_params.structured_outputs assert params is not None, "params can't be None." if params.json is not None: if not isinstance(params.json, str): json_str = json.dumps(params.json) else: json_str = params.json return (StructuredOutputOptions.JSON, json_str) elif params.json_object: return (StructuredOutputOptions.JSON_OBJECT, "") elif params.regex is not None: return (StructuredOutputOptions.REGEX, params.regex) elif params.choice is not None: if not isinstance(params.choice, str): json_str = json.dumps(params.choice) else: json_str = params.choice return (StructuredOutputOptions.CHOICE, json_str) elif params.grammar is not None: return (StructuredOutputOptions.GRAMMAR, params.grammar) elif params.structural_tag is not None: return (StructuredOutputOptions.STRUCTURAL_TAG, params.structural_tag) else: raise ValueError("No valid structured output parameter found") ``` #### 分支:release/v0.10.2 https://github.com/vllm-project/vllm/blob/releases/v0.10.2/vllm/v1/structured_output/request.py ```python def get_structured_output_key( sampling_params: SamplingParams) -> StructuredOutputKey: params = sampling_params.guided_decoding assert params is not None, "params can't be None." if params.json is not None: if not isinstance(params.json, str): json_str = json.dumps(params.json) else: json_str = params.json return (StructuredOutputOptions.JSON, json_str) elif params.json_object: return (StructuredOutputOptions.JSON_OBJECT, "") elif params.regex is not None: return (StructuredOutputOptions.REGEX, params.regex) elif params.choice is not None: if not isinstance(params.choice, str): json_str = json.dumps(params.choice) else: json_str = params.choice return (StructuredOutputOptions.CHOICE, json_str) elif params.grammar is not None: return (StructuredOutputOptions.GRAMMAR, params.grammar) elif params.structural_tag is not None: return (StructuredOutputOptions.STRUCTURAL_TAG, params.structural_tag) else: raise ValueError("No valid structured output parameter found") ``` > 2个版本的唯一区别,仅仅是提取`SamplingParams`的属性不一样,其它判断逻辑都是一致的 ### SamplingParams https://github.com/vllm-project/vllm/blob/releases/v0.11.0/vllm/sampling_params.py ```python # 其它参数已省略 class SamplingParams( 💡初始化逻辑是没问题的,默认值是None structured_outputs: Optional[StructuredOutputsParams] = None guided_decoding: Optional[GuidedDecodingParams] = None @staticmethod def from_optional( 💡默认值也是None structured_outputs: Optional[StructuredOutputsParams] = None, ) -> "SamplingParams": if guided_decoding is not None: warnings.warn( "guided_decoding is deprecated. This will be removed in " 💡官方将在 v0.12.0 废弃 guided_decoding 参数,使用 structured_outputs 参数替代,在 v0.11.0 版本做了兼容, "v0.12.0 or v1.0.0, which ever is soonest. Please use " "structured_outputs instead.", DeprecationWarning, stacklevel=2) structured_outputs = guided_decoding guided_decoding = None return SamplingParams( structured_outputs=structured_outputs, ) ``` ### StructuredOutputOptions ```python class StructuredOutputOptions(enum.Enum): JSON = enum.auto() JSON_OBJECT = enum.auto() REGEX = enum.auto() GRAMMAR = enum.auto() CHOICE = enum.auto() STRUCTURAL_TAG = enum.auto() ``` > 💡只支持这6种类型,每个类型都对应 structured_outputs 下面的一个不同的参数。 ## 崩溃日志片段 ```text (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] EngineCore encountered a fatal error. (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] Traceback (most recent call last): (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 701, in run_engine_core (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] engine_core.run_busy_loop() (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 728, in run_busy_loop (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] self._process_engine_step() (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 754, in _process_engine_step (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] outputs, model_executed = self.step_fn() (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 283, in step (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] scheduler_output = self.scheduler.schedule() (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/core/sched/scheduler.py", line 359, in schedule (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] if structured_output_req and structured_output_req.grammar: (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 45, in grammar (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] completed = self._check_grammar_completion() (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 33, in _check_grammar_completion (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] self._grammar = self._grammar.result(timeout=0.0001) (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 458, in result (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] return self.__get_result() (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] raise self._exception (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/thread.py", line 58, in run (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] result = self.fn(*self.args, **self.kwargs) (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/__init__.py", line 128, in _async_create_grammar (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] key = request.structured_output_request.structured_output_key # type: ignore[union-attr] (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/functools.py", line 981, in __get__ (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] val = self.func(instance) (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 58, in structured_output_key (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] return get_structured_output_key(self.sampling_params) (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 86, in get_structured_output_key (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] raise ValueError("No valid structured output parameter found") (EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] ValueError: No valid structured output parameter found (EngineCore_DP0 pid=2738693) Process EngineCore_DP0: (EngineCore_DP0 pid=2738693) Traceback (most recent call last): (APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] AsyncLLM output_handler failed. (APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] Traceback (most recent call last): (APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 439, in output_handler (APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] outputs = await engine_core.get_output_async() (APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 846, in get_output_async (APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] raise self._format_exception(outputs) from None (APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=2738693) self.run() (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=2738693) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core (EngineCore_DP0 pid=2738693) raise e (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 701, in run_engine_core (EngineCore_DP0 pid=2738693) engine_core.run_busy_loop() (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 728, in run_busy_loop (EngineCore_DP0 pid=2738693) self._process_engine_step() (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 754, in _process_engine_step (EngineCore_DP0 pid=2738693) outputs, model_executed = self.step_fn() (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 283, in step (EngineCore_DP0 pid=2738693) scheduler_output = self.scheduler.schedule() (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/core/sched/scheduler.py", line 359, in schedule (EngineCore_DP0 pid=2738693) if structured_output_req and structured_output_req.grammar: (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 45, in grammar (EngineCore_DP0 pid=2738693) completed = self._check_grammar_completion() (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 33, in _check_grammar_completion (EngineCore_DP0 pid=2738693) self._grammar = self._grammar.result(timeout=0.0001) (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 458, in result (EngineCore_DP0 pid=2738693) return self.__get_result() (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result (EngineCore_DP0 pid=2738693) raise self._exception (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/thread.py", line 58, in run (EngineCore_DP0 pid=2738693) result = self.fn(*self.args, **self.kwargs) (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/__init__.py", line 128, in _async_create_grammar (EngineCore_DP0 pid=2738693) key = request.structured_output_request.structured_output_key # type: ignore[union-attr] (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/functools.py", line 981, in __get__ (EngineCore_DP0 pid=2738693) val = self.func(instance) (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 58, in structured_output_key (EngineCore_DP0 pid=2738693) return get_structured_output_key(self.sampling_params) (EngineCore_DP0 pid=2738693) File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 86, in get_structured_output_key (EngineCore_DP0 pid=2738693) raise ValueError("No valid structured output parameter found") (EngineCore_DP0 pid=2738693) ValueError: No valid structured output parameter found (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] Error in chat completion stream generator. (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] Traceback (most recent call last): (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] async for res in result_generator: (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 387, in generate (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] out = q.get_nowait() or await q.get() (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/output_processor.py", line 59, in get (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] raise output (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 439, in output_handler (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] outputs = await engine_core.get_output_async() (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 846, in get_output_async (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] raise self._format_exception(outputs) from None (APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. [rank0]:[W1010 10:43:10.666166970 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=2738423) INFO: Shutting down (APIServer pid=2738423) INFO: Waiting for application shutdown. (APIServer pid=2738423) INFO: Application shutdown complete. (APIServer pid=2738423) INFO: Finished server process [2738423] ```