Judgment Labs Raises $32M to Turn Production Agent Data Into Improvements
Judgment Labs just announced $32M across a combined seed and Series A, both rounds led by Lightspeed. Nova Global, SV Angel, Valor Equity, and Dynamic all joined. Lightspeed doubling down inside six months is the loud signal β they re-upped before there was any pressure to mark the round.
What Judgment actually does is the unsexy but hard part of agents. Every team running agents in production hits the same wall: traces are long, tool calls fan out, memory churns, and when something breaks at 3am you have no good way to find the failure pattern. Judgment indexes the trace, lets engineering query agent behavior the way you'd query logs, swarms agent judges over failure cases to triage, and replays proposed fixes against real production scenarios before you ship.
The research stack is named: Agent Search for trajectory queries, Agent Judge for cheap evaluators, Behavior Discovery for surfacing failure patterns, AutoRubrics for evaluator construction. Slack integration so PMs and ops can pull threads when users complain. The pitch lands as continuous-improvement layer for agents.
The founders are 22, 23, 23. Best friends since childhood. Alex Shan came out of Stanford NLP under Manning, Andrew Li was early at TogetherAI, Joseph Camyre built infra at Datadog. The bet is that the next layer of the agent stack is observability-plus-improvement, not just observability. Datadog for traces is a settled question; what is open is who owns the loop from production trace to fixed agent. Judgment is positioning to own that loop.
Site: https://www.judgmentlabs.ai/
← Back to all articles
What Judgment actually does is the unsexy but hard part of agents. Every team running agents in production hits the same wall: traces are long, tool calls fan out, memory churns, and when something breaks at 3am you have no good way to find the failure pattern. Judgment indexes the trace, lets engineering query agent behavior the way you'd query logs, swarms agent judges over failure cases to triage, and replays proposed fixes against real production scenarios before you ship.
The research stack is named: Agent Search for trajectory queries, Agent Judge for cheap evaluators, Behavior Discovery for surfacing failure patterns, AutoRubrics for evaluator construction. Slack integration so PMs and ops can pull threads when users complain. The pitch lands as continuous-improvement layer for agents.
The founders are 22, 23, 23. Best friends since childhood. Alex Shan came out of Stanford NLP under Manning, Andrew Li was early at TogetherAI, Joseph Camyre built infra at Datadog. The bet is that the next layer of the agent stack is observability-plus-improvement, not just observability. Datadog for traces is a settled question; what is open is who owns the loop from production trace to fixed agent. Judgment is positioning to own that loop.
Site: https://www.judgmentlabs.ai/
Comments