April 27, 2026Open Source Coding Agents Benchmark

Dirac Beat Junie CLI on TerminalBench-2

A solo dev named GodelNumbering shipped an open-source coding agent called Dirac. On April 27 it landed on the TerminalBench-2 leaderboard at 65.2% with Gemini-3-flash-preview. Google's official baseline is 47.8%. JetBrains' Junie CLI, the previous top closed-source slot, scored 64.3%. So an Apache-2.0 fork of Cline just beat the best closed coding agent on the same model.

What's actually different. The creator's argument: the harness matters more than the model. Five concrete bets. Hash-anchored edits instead of line-number edits, so insertions don't scramble the targeting. AST-driven context selection across 14 tree-sitter parsers, so the model sees structural code instead of fuzzy text windows. Tools that accept lists as parameters, so the model edits multiple files in a single call instead of being coaxed into looping. Executable analysis - the model can write bash and Python to actually run things, not just read them. Opportunistic context curation that pre-fetches what it predicts the model will need.

The cost number. 8 of 8 evaluation tasks at $0.18 average per task. Dirac claims 64.8% cheaper than competitors on the same workload. That number is the full point. Anyone benchmarking Cursor or Claude Code on a real refactor knows the cost-per-task line item is what kills you. A coding agent that solves harder tasks for one fifth the dollar cost rewrites the calculus for self-hosted agent deploys.

What it doesn't do. No MCP support. The creator made an explicit call: native tool calling only. That cuts off a big slice of the ecosystem that's been building MCP servers for the past year. It's a credible bet - MCP adds tokens and latency for things the model can already do via native tools - but it's also a fork in the road that the broader agent ecosystem has been trying to converge.

Background. Dirac is a fork of Cline that the creator says is "70,000 lines of new code over two months." 585 stars at fetch time, growing fast off the HN frontpage spike. This is the cleanest demonstration this month that "agent quality is harness quality" is something the closed-source vendors are still leaving money on the table for.

https://github.com/dirac-run/dirac

← Previous

GitHub Copilot Switches to Token-Based Billing

Super User Daily: 2026-04-28

← Back to all articles

Dirac Beat Junie CLI on TerminalBench-2

Related Articles

Comments