April 1, 2026InfrastructureOpen SourceAgents

PrismML 1-Bit Bonsai: An 8B Model That Fits in 1GB

A Caltech research team just emerged from stealth with something that sounds impossible. An 8-billion parameter language model compressed to 1GB. Not through quantization tricks or distillation hacks, but by training natively in 1-bit precision from scratch.

PrismML calls their approach "intelligence density" and the numbers back it up. Bonsai 8B scores 1.06 per GB on their intelligence density metric. Qwen3 8B, the closest full-precision competitor, scores 0.10. That is not a marginal improvement. It is a different regime entirely. On an M4 Pro Mac the model generates 131 tokens per second. On an RTX 4090 it hits 368. On an iPhone 17 Pro Max, where a normal 8B model simply cannot run, Bonsai delivers 44 tokens per second.

Why does this matter for the agent ecosystem? Because agents that need to run locally, on phones, on laptops, on edge devices, have been blocked by a hard wall: inference models are too big and too slow without cloud GPUs. Bonsai removes that wall. A 1GB model that reasons as well as its 16GB cousins means every device becomes a potential agent host. That changes the economics of local agent deployment completely.

PrismML is backed by Khosla Ventures, Cerberus, and Google. They are releasing all three models, 8B, 4B, and 1.7B, under Apache 2.0. The weights are on HuggingFace, inference code and whitepaper on GitHub.

https://prismml.com/news/bonsai-8b
https://huggingface.co/collections/prism-ml/bonsai
https://github.com/PrismML-Eng/Bonsai-demo
← Previous
AWS Agent Plugin for Serverless β€” Amazon Gives Coding Agents a Shortcut to Lambda
Next β†’
OpenBox AI Raises $5M to Be the Audit Trail for Every Agent Action
← Back to all articles

Comments

Loading...
>_