MinT Wants to Run a Million LoRA Policies on One Base Model
Mind Lab dropped a paper on arXiv (2605.13779) called MinT — Managed Infrastructure for Training and Serving Millions of LLMs. 141 upvotes on HuggingFace as of this morning, and that count understates how strategically important the idea is. 61 authors on the byline.
The pitch: stop treating every fine-tuned model as a separate checkpoint. Keep one frontier-scale base model resident, then handle every customization as a LoRA adapter that loads and unloads on demand. The headline numbers: scales up to over a trillion parameters on the base, scales adapter sizes down under 1% of base model size, and scales out to support up to ten-to-the-sixth addressable LoRA policies with concurrent multi-policy training. That last bit — millions of independently-trainable policies on a shared substrate — is what makes this an agent-infrastructure story, not just an inference paper.
The engineering wins are specific. Adapter-only handoff cuts data movement by 18.3x on a 4B model and 2.85x on a 30B model. Concurrent multi-policy training shortens processing time by 1.77x and 1.45x with no memory overhead penalty. Packed MoE LoRA tensors improve engine loading 8.5x to 8.7x. Thousands of adapters active in deployment waves at any moment.
Why this matters for the agent thesis: every serious 'personalized agent' product right now is forced to choose between in-context personalization (cheap, leaky) or full fine-tunes (expensive, slow). MinT is the missing middle — per-user agent personalization as a LoRA adapter that loads when the user logs in. If this productizes, the unit economics of personalized agents move from 'one agent per cluster' to 'one cluster per million agents.' arXiv 2605.13779 for the paper.
← Back to all articles
The pitch: stop treating every fine-tuned model as a separate checkpoint. Keep one frontier-scale base model resident, then handle every customization as a LoRA adapter that loads and unloads on demand. The headline numbers: scales up to over a trillion parameters on the base, scales adapter sizes down under 1% of base model size, and scales out to support up to ten-to-the-sixth addressable LoRA policies with concurrent multi-policy training. That last bit — millions of independently-trainable policies on a shared substrate — is what makes this an agent-infrastructure story, not just an inference paper.
The engineering wins are specific. Adapter-only handoff cuts data movement by 18.3x on a 4B model and 2.85x on a 30B model. Concurrent multi-policy training shortens processing time by 1.77x and 1.45x with no memory overhead penalty. Packed MoE LoRA tensors improve engine loading 8.5x to 8.7x. Thousands of adapters active in deployment waves at any moment.
Why this matters for the agent thesis: every serious 'personalized agent' product right now is forced to choose between in-context personalization (cheap, leaky) or full fine-tunes (expensive, slow). MinT is the missing middle — per-user agent personalization as a LoRA adapter that loads when the user logs in. If this productizes, the unit economics of personalized agents move from 'one agent per cluster' to 'one cluster per million agents.' arXiv 2605.13779 for the paper.
Comments