Add real-world rules and your coding agent quietly falls apart
A new paper from Francesco Dente, Dario Satriani and Paolo Papotti, titled Constraint Decay: The Fragility of LLM Agents in Backend Code Generation, hit the Hacker News front page today with a finding that should sober up anyone who watched a demo and concluded agents already do backend. The short version: as you pile structural requirements onto a task, agent performance decays. Capable setups lose around 30 percentage points on assertion pass rate going from a bare task to a fully specified one. That is not a rounding error, that is the difference between works and does not work.
The details are where it bites. They ran 80 greenfield generation tasks and 20 feature-implementation tasks across eight web frameworks. Agents do fine in minimal frameworks like Flask and fall apart in convention-heavy ones like FastAPI and Django, exactly the frameworks real teams ship on. And the root cause is not glamorous: the data layer. Wrong query composition, ORM runtime violations. The plumbing breaks first.
Why this is the useful kind of paper. The demos always show an agent one-shotting a clean CRUD app, and it looks like magic. But real backend work is almost entirely constraints, auth rules, schema invariants, framework conventions, data integrity that has to hold under load. This paper measures exactly where the magic stops, and it stops at the boring, hard-to-eyeball data-layer code, the part you cannot catch by skimming the diff.
My take. The honest signal in agent research right now is not the new high score, it is the careful negative result. Constraint Decay names a failure mode every team using coding agents has felt but could not articulate: the more real your requirements get, the less the agent helps. Paper at https://arxiv.org/abs/2605.06445
← Back to all articles
The details are where it bites. They ran 80 greenfield generation tasks and 20 feature-implementation tasks across eight web frameworks. Agents do fine in minimal frameworks like Flask and fall apart in convention-heavy ones like FastAPI and Django, exactly the frameworks real teams ship on. And the root cause is not glamorous: the data layer. Wrong query composition, ORM runtime violations. The plumbing breaks first.
Why this is the useful kind of paper. The demos always show an agent one-shotting a clean CRUD app, and it looks like magic. But real backend work is almost entirely constraints, auth rules, schema invariants, framework conventions, data integrity that has to hold under load. This paper measures exactly where the magic stops, and it stops at the boring, hard-to-eyeball data-layer code, the part you cannot catch by skimming the diff.
My take. The honest signal in agent research right now is not the new high score, it is the careful negative result. Constraint Decay names a failure mode every team using coding agents has felt but could not articulate: the more real your requirements get, the less the agent helps. Paper at https://arxiv.org/abs/2605.06445
Comments