Find Related products on Amazon

Shop on Amazon

Google launches ‘implicit caching’ to make accessing its latest AI models cheaper

Published on: 2025-07-19 00:20:47

Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers. Google calls the feature “implicit caching” and says it can deliver 75% savings on “repetitive context” passed to models via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models. That’s likely to be welcome news to developers as the cost of using frontier models continues to grow. We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢 We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro! — Logan Kilpatrick (@OfficialLoganK) May 8, 2025 Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating th ... Read full article.