GoKawiil - Ollama's new engine for multimodal models

Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models: General Multimodal Understanding & Reasoning Llama 4 Scout ollama run llama4:scout (Note: this is a 109 billion parameter, mixture-of-experts model.) Example: asking location-based questions about a video frame: You can then ask follow-up questions: ollama@ollamas-computer ~ % ollama run llama4:scout >>> what do you see in this image? /Users/ollama/Downloads/multimodal-example1.png Added image '/Users/ollama/Downloads/multimodal-example1.png' The image depicts a scenic waterfront area with a prominent clock tower at its center. The clock tower, likely the Ferry Building Clock Tower in San Francisco, features a tall structure with a clock face on each side and an American flag flying atop. In the foreground, there are several buildings, including what appears to be a marketplace or shopping center, with a parking lot and a street lined with cars. The background showcases a l ... Read full article.

Find Related products on Amazon

Ollama's new engine for multimodal models

Related Articles