Featured imageThe world of local AI is moving at an incredible pace, and at the heart of this revolution is llama.cpp—the powerhouse C++ inference engine that brings Large Language Models (LLMs) to everyday hardware (and it’s also the inference engine that powers Docker Model Runner).Developers love llama.cpp for its performance and simplicity.And we at Docker are obsessed with making developer workflows simpler. That’s why we’re thrilled to announce a game-changing new feature in llama.cpp:native support for pulling and running GGUF models directly from Docker

Just published by Docker: Read more