May 3, 2026Funding-Series AAgentsResearch

Standard Intelligence Raises $75M Series A to Train Computer Agents on Pixels

Two kids who met at the Atlas Fellowship just raised $75M from Sequoia and Spark. Galen Mead is 21, Devansh Pandey is 20. Their pitch: skip language and tool calls, train computer agents on raw video of people using computers. Predict the next mouse movement, click, and keystroke directly from pixels. Like Tesla FSD, but for knowledge work.

Standard Intelligence's first foundation model FDM-1 was built on what they claim is the largest computer-action dataset in the industry, eleven million hours. Their video encoder fits two hours of 30 FPS video into a one-million-token context, fifty times more efficient than competing approaches. They built a 30-petabyte storage cluster for under five hundred thousand dollars. Compared to hyperscaler costs, that's twenty times cheaper.

FDM-1 can extrude CAD gears in Blender, drive a car around San Francisco after one hour of fine-tuning, and debug software by exploring state space. Stanley Druckenmiller and Andrej Karpathy are advisors. Six people in San Francisco, $425M pre-money valuation. Their internal claim: FDM-1 already moved computer use from data-bound to compute-bound.

This is the contrarian bet against Anthropic's Computer Use, OpenAI Operator, and Manus Cloud Computer, all of which feed screenshots plus tool calls. Standard Intelligence is saying language is the wrong abstraction for desktop agents. If they're right, every screenshot-plus-text approach has been wasting compute for a year. If they're wrong, video data becomes the most expensive negative result of 2026.

Site: https://si.inc/
← Previous
The Autoresearch Loop Works. The Market Doesn't.
Next β†’
Netomi Raises $110M Series C as Accenture and Adobe Bet on Agentic CX
← Back to all articles

Comments

Loading...
>_