Gemini API File Search is now multimodal

Today, we are expanding the Gemini API’s File Search tool. You can now build retrieval-augmented generation (RAG) systems with multimodal data and custom metadata. We’re also introducing page citations to improve grounding and transparency.

Whether you are prototyping a weekend project or scaling a production application for thousands of users, your RAG systems can now natively process and better organize your text and visual data.

Give your apps a photographic memory

File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data, providing your agents contextual awareness.

Think of a creative agency trying to dig up a specific visual asset. Instead of relying on keywords or filenames, your app can search an entire archive for an image matching a specific emotional tone or visual style described in a natural language brief.

See how developers are already using it: