June 20, 2026Agents Tool Coding

Cosine built a model that hacks instead of refusing

Cosine, the company behind the Genie coding agent that topped SWE-bench, just shipped ArgusRed, a CLI for security scanning and penetration testing. The interesting part is not the CLI. It is that ArgusRed runs on a model Cosine post-trained specifically for offensive security, and they built it for one blunt reason: off-the-shelf models refuse the work this product does. Ask GPT or Claude to write the exploit, and you get a lecture. A red teamer cannot use a model that says no to the actual job.

There are two modes. Security Scan is read-only and self-serve, it reads your codebase and flags the parts that are actually exploitable, not the lint-level noise. Pen Test mode goes further, an AI swarm actually attempts exploits against authorized systems. And here is the design choice that makes the whole thing defensible: safety does not live in the model's reluctance, it lives in a Go harness that sits underneath the model and intercepts every tool call before it executes. In Scan mode the harness deterministically blocks any mutating tool no matter what the model wants. In Pen Test mode it caps network egress to authorized targets only, and the mode is gated behind signed, scoped authorization you book in advance.

That split is the real story. The industry's default answer to dangerous capability has been to train the model to refuse, and we keep watching that backfire, the Anthropic export ban a week ago was triggered by a claimed jailbreak that came down to the three words fix this code. You cannot RLHF your way to a model that is both genuinely useful to defenders and safe, because the same capability serves both. Cosine's answer is to stop pretending the model's judgment is the guardrail. Build the capability fully, then put the actual control in a deterministic layer that decides which tools are allowed to fire.

My read is that harness-over-refusal is just the correct architecture, and not only for security. A refusal is a blunt instrument applied at the wrong layer, the model guessing about intent it cannot verify. A harness knows exactly what the tool call will do and can allow or block it on policy. Defenders have been starved of offensive tooling precisely because the best models were trained to flinch. Cosine is betting the people who actually defend systems would rather have a sharp tool with a hard safety rail than a soft tool that apologizes.

It is free to install with 2 million free tokens, it is not open source, and Pen Test access requires written authorization. Details at argusred.com/cli.

← Previous

Cloudflare just gave AI agents a throwaway account

Super User Daily: June 21, 2026

← Back to all articles

Cosine built a model that hacks instead of refusing

Related Articles

Comments