May 19, 2026Framework Open Source Agents

Forge Squeezes 86% Out of an 8B Local Model

Show HN, 124 points: antoinezambelli/forge. A Python reliability layer for self-hosted LLM tool-calling. The claim that put it on the front page: Forge takes an 8B local model (Ministral-3 8B Q8 on llama-server) and pushes it to 86.5% on a 26-scenario agentic eval suite — and 76% on the hardest tier.

The trick isn't a new model. It's three things stacked: rescue parsing that catches malformed JSON before it crashes the loop, retry nudges that don't burn turns on the same dead end twice, and step enforcement that keeps the model from skipping mandatory tool calls. Wrap a local 8B model in this and it stops being a toy.

The bigger framing comes in the companion paper (IEEE DOI: 10.1145/3786335.3813193): most "small models can't do agents" claims aren't about the models — they're about the harness. The author wrote a synthetic respond tool that gets injected into the prompt to guide tool-calling, then stripped from the output. ADR-013 in the repo. Cute hack.

If you're trying to run agents on your laptop without an API key, this is the cleanest framework I've seen this month for that exact use case.

GitHub: https://github.com/antoinezambelli/forge

← Previous

Mistral Buys Emmi to Take Industrial AI

ViMax Runs Video Generation Like a Film Crew

← Back to all articles

Forge Squeezes 86% Out of an 8B Local Model

More Articles

Comments