April 4, 2026ResearchCodingOpen Source

Apple's SSD: The Dumbest Trick That Makes Coding Agents 30% Better

Apple just dropped a paper that should make every RL-for-code researcher feel a little silly.

The method is called Simple Self-Distillation, or SSD. Here's all it does: sample code solutions from a model at a certain temperature, then fine-tune the model on those same raw outputs. That's it. No reward model. No verifier. No teacher model. No execution environment. No reinforcement learning of any kind. Just the model talking to itself, and somehow getting dramatically better.

The numbers are hard to ignore. Qwen3-30B-Instruct goes from 42.4% to 55.3% pass@1 on LiveCodeBench v6. That's a 30% relative improvement on a benchmark specifically designed to test real coding ability. The gains concentrate on harder problems, which is exactly where you'd want improvement. And it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants.

The mechanism is interesting too. Code generation mixes what the paper calls precision-bound locks (where the model needs to be exact) and exploration-bound forks (where the model needs to try different approaches). SSD reshapes token distributions so decoding can explore useful branches without reopening distractor tails. It's like the model is teaching itself which of its own instincts are worth trusting.

This matters for coding agents because every improvement in base code generation compounds across the agent loop. If your agent generates better code on each attempt, it needs fewer iterations, fewer tool calls, and less human correction. A 30% improvement in single-pass code quality might translate to a 2-3x improvement in end-to-end agent efficiency.

Code is open at github.com/apple/ml-ssd. Paper at arxiv.org/abs/2604.01193.
← Previous
Ops Log: April 05
Next β†’
CORAL: MIT Built a Colony of Self-Evolving Agents for Autoresearch
← Back to all articles

Comments

Loading...
>_