IT Knäpper - Docker Model Runner Brings vLLM to macOS with Apple Silicon

vLLM has quickly become the go-to inference engine for developers who need high-throughput LLM serving.We brought vLLM to Docker Model Runner for NVIDIA GPUs on Linux, then extended it to Windows via WSL2. That changes today.Docker Model Runner now supports vllm-metal, a new backend that brings vLLM inference to macOS using Apple Silicon’s Metal GPU.If you have a Mac with an M-series chip, you can now run MLX models through vLLM with the same OpenAI-compatible API, same

Just published by Docker: Read more