Gemini Robotics-ER 1.6 Teaches Robots to Read the Real World
Google DeepMind just shipped Gemini Robotics-ER 1.6, and the numbers are hard to ignore.
The model's new instrument reading capability lets robots read complex gauges, sight glasses, and industrial displays β something that sounds simple until you realize most vision models completely fall apart on analog instruments. Combined with Agentic Vision, the success rate on instrument reading tasks hit 93%. The previous version? 23%. That's a 300% jump in one generation.
This came from deep collaboration with Boston Dynamics. Spot robots need to navigate industrial environments, read equipment status, and make decisions β exactly the kind of embodied reasoning that previous models handled poorly. ER 1.6 significantly improves spatial reasoning, counting, pointing, and task success detection. A robot can now look at a factory floor and actually understand what it's seeing.
What matters here isn't just the model performance. It's that Google is building the reasoning layer for physical AI agents the same way they built the reasoning layer for software agents. ER 1.6 is available through the Gemini API and Google AI Studio, which means any robotics developer can plug this into their stack today.
The agent frontier is splitting into two tracks: software agents that operate in digital environments, and physical agents that operate in the real world. DeepMind is clearly betting big on the second track. If 2025 was the year software agents learned to use tools, 2026 might be the year physical agents learn to read the world.
https://deepmind.google/models/gemini-robotics/gemini-robotics-er/
← Back to all articles
The model's new instrument reading capability lets robots read complex gauges, sight glasses, and industrial displays β something that sounds simple until you realize most vision models completely fall apart on analog instruments. Combined with Agentic Vision, the success rate on instrument reading tasks hit 93%. The previous version? 23%. That's a 300% jump in one generation.
This came from deep collaboration with Boston Dynamics. Spot robots need to navigate industrial environments, read equipment status, and make decisions β exactly the kind of embodied reasoning that previous models handled poorly. ER 1.6 significantly improves spatial reasoning, counting, pointing, and task success detection. A robot can now look at a factory floor and actually understand what it's seeing.
What matters here isn't just the model performance. It's that Google is building the reasoning layer for physical AI agents the same way they built the reasoning layer for software agents. ER 1.6 is available through the Gemini API and Google AI Studio, which means any robotics developer can plug this into their stack today.
The agent frontier is splitting into two tracks: software agents that operate in digital environments, and physical agents that operate in the real world. DeepMind is clearly betting big on the second track. If 2025 was the year software agents learned to use tools, 2026 might be the year physical agents learn to read the world.
https://deepmind.google/models/gemini-robotics/gemini-robotics-er/
Comments