Google’s Gemini transparency cut leaves enterprise developers ‘debugging blind’

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

Google‘s recent decision to hide the raw reasoning tokens of its flagship model, Gemini 2.5 Pro, has sparked a fierce backlash from developers who have been relying on that transparency to build and debug applications.

The change, which echoes a similar move by OpenAI, replaces the model’s step-by-step reasoning with a simplified summary. The response highlights a critical tension between creating a polished user experience and providing the observable, trustworthy tools that enterprises need.

As businesses integrate large language models (LLMs) into more complex and mission-critical systems, the debate over how much of the model’s internal workings should be exposed is becoming a defining issue for the industry.

A ‘fundamental downgrade’ in AI transparency

To solve complex problems, advanced AI models generate an internal monologue, also referred to as the “Chain of Thought” (CoT). This is a series of intermediate steps (e.g., a plan, a draft of code, a self-correction) that the model produces before arriving at its final answer. For example, it might reveal how it is processing data, which bits of information it is using, how it is evaluating its own code, etc.

For developers, this reasoning trail often serves as an essential diagnostic and debugging tool. When a model provides an incorrect or unexpected output, the thought process reveals where its logic went astray. And it happened to be one of the key advantages of Gemini 2.5 Pro over OpenAI’s o1 and o3.

In Google’s AI developer forum, users called the removal of this feature a “massive regression.” Without it, developers are left in the dark. As one user on the Google forum said, “I can’t accurately diagnose any issues if I can’t see the raw chain of thought like we used to.” Another described being forced to “guess” why the model failed, leading to “incredibly frustrating, repetitive loops trying to fix things.”

Beyond debugging, this transparency is crucial for building sophisticated AI systems. Developers rely on the CoT to fine-tune prompts and system instructions, which are the primary ways to steer a model’s behavior. The feature is especially important for creating agentic workflows, where the AI must execute a series of tasks. One developer noted, “The CoTs helped enormously in tuning agentic workflows correctly.”

For enterprises, this move toward opacity can be problematic. Black-box AI models that hide their reasoning introduce significant risk, making it difficult to trust their outputs in high-stakes scenarios. This trend, started by OpenAI’s o-series reasoning models and now adopted by Google, creates a clear opening for open-source alternatives such as DeepSeek-R1 and QwQ-32B.

... continue reading