The world of local AI is moving at an incredible pace, and at the
heart of this revolution is llama.cpp—the powerhouse C++ inference
engine that brings Large Language Models (LLMs) to everyday
hardware (and it’s also the inference engine that powers Docker Model Runner).Developers love llama.cpp
for its performance and simplicity.And we at Docker are obsessed
with making developer workflows simpler. That’s why we’re thrilled
to announce a game-changing new feature in llama.cpp:native support
for pulling and running GGUF models directly from Docker