Ollama's new engine for multimodal models
Published on: 2025-07-06 12:43:27
Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:
General Multimodal Understanding & Reasoning
Llama 4 Scout
ollama run llama4:scout
(Note: this is a 109 billion parameter, mixture-of-experts model.)
Example: asking location-based questions about a video frame:
You can then ask follow-up questions:
ollama@ollamas-computer ~ % ollama run llama4:scout >>> what do you see in this image? /Users/ollama/Downloads/multimodal-example1.png Added image '/Users/ollama/Downloads/multimodal-example1.png' The image depicts a scenic waterfront area with a prominent clock tower at its center. The clock tower, likely the Ferry Building Clock Tower in San Francisco, features a tall structure with a clock face on each side and an American flag flying atop. In the foreground, there are several buildings, including what appears to be a marketplace or shopping center, with a parking lot and a street lined with cars. The background showcases a l
... Read full article.