vLLM has quickly become the go-to inference
engine for developers who need high-throughput LLM serving.We
brought vLLM to Docker Model Runner for NVIDIA
GPUs on Linux, then extended it to Windows via WSL2. That changes today.Docker Model Runner now supports
vllm-metal, a new backend that brings vLLM
inference to macOS using Apple Silicon’s Metal GPU.If you have a
Mac with an M-series chip, you can now run MLX models through vLLM
with the same OpenAI-compatible API, same
vLLM has quickly become the go-to inference
engine for developers who need high-throughput LLM serving.We
brought vLLM to Docker Model Runner for NVIDIA
GPUs on Linux, then extended it to Windows via WSL2. That changes today.Docker Model Runner now supports
vllm-metal, a new backend that brings vLLM
inference to macOS using Apple Silicon’s Metal GPU.If you have a
Mac with an M-series chip, you can now run MLX models through vLLM
with the same OpenAI-compatible API, same